Skip to content

Database Schema

Craft stores all its data in a single PostgreSQL 17 database. The schema is designed around two concerns: efficiently storing and querying orbital and celestial data, and feeding that data into a vector embedding pipeline for semantic search.

The database runs on the timescale/timescaledb-ha:pg17 image, which bundles several extensions beyond what stock PostgreSQL provides:

  • pgvector — adds the vector column type and distance operators (cosine, L2, inner product) used for embedding similarity search
  • pgai — the higher-level extension that manages vectorizer definitions, tracks which rows need re-embedding, and coordinates the vectorizer worker

The pgai extension is not pre-installed in the container image. A one-shot init container (pgai-install) runs at startup, connects to the database, and installs or upgrades the extension to the version bundled with the vectorizer worker image. This ensures the extension version always matches the worker version, avoiding protocol mismatches.

The central table. Each row represents a tracked satellite with its current Two-Line Element set.

ColumnTypePurpose
norad_idinteger (PK)NORAD catalog number — the universal satellite identifier
namevarchar(100)Common name (e.g., “ISS (ZARYA)“)
intl_designatorvarchar(20)International designator (e.g., “1998-067A”)
tle_line1, tle_line2varchar(70)The two TLE lines, stored verbatim
tle_epochtimestamptzWhen the TLE was generated
std_magfloatStandard visual magnitude (intrinsic brightness)
rcsfloatRadar cross-section in square meters
orbit_typevarchar(20)LEO, MEO, GEO, HEO, etc.
object_typevarchar(20)PAYLOAD, ROCKET BODY, DEBRIS, etc.
search_texttextDenormalized blob for the vectorizer (see below)
is_activebooleanWhether the satellite has decayed
updated_attimestamptzLast TLE update time

The NORAD ID is used as the primary key rather than a synthetic auto-increment because it is the globally unique identifier that every external system (CelesTrak, Space-Track, satellite.js) uses to reference a satellite. There is no ambiguity to resolve.

A join table linking satellites to named groups (amateur, visual, stations, weather, starlink, and debris categories). A satellite can belong to multiple groups. Groups are populated automatically during TLE ingestion based on which CelesTrak group file the satellite was fetched from.

A general-purpose table for everything that is not a satellite: stars, planets, deep sky objects, comets, meteor showers, debris with known RA/Dec, radio sources, and custom targets.

ColumnTypePurpose
idinteger (PK)Auto-increment identifier
namevarchar(100)Display name
object_typevarchar(20)star, planet, dso, comet, meteor_shower, debris, custom, radio_source
ra_hours, dec_degreesfloatJ2000 coordinates (fixed objects)
magnitudefloatVisual magnitude
catalog_idvarchar(30)Standard catalog identifier (HIP 91262, M31, NGC 7331)
constellationvarchar(5)IAU constellation abbreviation
descriptiontextFree-text description
search_texttextDenormalized blob for the vectorizer

For comets, additional columns store Keplerian orbital elements:

ColumnPurpose
epoch_jdEpoch as Julian Date
perihelion_auPerihelion distance in AU
eccentricityOrbital eccentricity
inclination_degOrbital inclination
arg_perihelion_degArgument of perihelion
lon_ascending_degLongitude of ascending node
perihelion_year/month/dayPerihelion passage time (year, month, fractional day)
magnitude_g, magnitude_kAbsolute magnitude and slope parameter
designationMPC designation (unique, used for deduplication)

This dual-purpose design avoids having separate tables for stars, DSOs, and comets. The object_type column discriminates, and the orbital element columns are simply null for fixed-position objects. The SkyEngine checks whether orbital elements are present and selects the appropriate computation path accordingly.

Radio frequencies associated with any target type.

ColumnTypePurpose
idinteger (PK)Auto-increment
target_typevarchar(20)Which table the target lives in
target_idvarchar(30)The target’s primary key (as a string)
descriptionvarchar(100)”uplink”, “downlink”, “beacon”, etc.
frequency_mhzfloatFrequency in MHz
modulationvarchar(100)FM, SSB, CW, AFSK, etc.
bandwidth_khzfloatSignal bandwidth

Upcoming and recent rocket launches, sourced from Launch Library 2.

ColumnTypePurpose
idinteger (PK)Auto-increment
ll2_idvarchar(50)Launch Library 2 UUID (unique, for deduplication)
namevarchar(200)Full launch name (e.g., “Falcon 9 Block 5 | Starlink Group 6-70”)
nettimestamptzNo Earlier Than — the scheduled launch time
window_start/endtimestamptzLaunch window boundaries
provider_namevarchar(100)SpaceX, Rocket Lab, etc.
rocket_namevarchar(100)Vehicle name
pad_namevarchar(200)Launch site
pad_lat/lonfloatLaunch site coordinates
mission_*variousMission name, type, orbit, and description
image_url, webcast_url, info_urltextMedia links
is_upcomingbooleanWhether the launch is in the future
search_texttextDenormalized blob for the vectorizer

Predicted and confirmed reentry events, sourced from Space-Track TIP messages.

  • observer_location — the ground station position (latitude, longitude, altitude), used for all topocentric calculations
  • rotor_config — rotctld connection parameters (host, port, min elevation, park position) for each configured rotor
  • tracking_session — audit log of rotor tracking sessions (start/end times, target, notes)
  • catalog_source — tracks the last update time and status for each TLE source group

Each table with a search_text column has a corresponding embedding store table, created automatically by the pgai vectorizer system:

Source tableEmbedding tableRows (typical)
satellitesatellite_embedding_store~22,000
celestial_objectcelestial_object_embedding_store~500
target_frequencytarget_frequency_embedding_store~300
rocket_launchrocket_launch_embedding_store~200

Each embedding store contains a foreign key back to the source row and a vector(1024) column holding the embedding generated by the mxbai-embed-large model. The vectorizer worker maintains these tables automatically — when a source row’s search_text changes, the worker detects it and regenerates the embedding.

Rather than embedding raw column values, each table has a search_text column that assembles a richer representation of the object. For a satellite, this includes:

ISS (ZARYA) 1998-067A (PAYLOAD) groups: stations, visual, amateur
frequencies: 145.8 MHz FM voice downlink, 437.8 MHz APRS

This enriched text means the embedding captures semantic relationships that no single column carries. A search for “amateur radio in space” finds the ISS because “amateur” appears in the group name and “radio” appears in the frequency context, even though neither word is in the satellite’s name or object type.

The build_satellite_search_text, build_launch_search_text, and build_celestial_search_text functions in the search_text service module construct these strings. They query related tables (groups, frequencies) and concatenate everything into a single text blob. This denormalization is the tradeoff: the search_text column duplicates information that exists in normalized form elsewhere, but it gives the vectorizer exactly the context it needs to produce useful embeddings.

The omnisearch endpoint (/api/search) runs a hybrid query:

  1. Semantic path: The query string is embedded using the same mxbai-embed-large model. The resulting vector is compared against each embedding store using cosine distance. Results are ranked by similarity.

  2. Text path: A traditional ILIKE search runs against name columns (and catalog_id for celestial objects). Results get a flat score of 0.5.

  3. Merge: Results from both paths are deduplicated by (target_type, target_id). Objects found by both methods receive a 0.1 score boost and are marked as source “both.”

The search also supports type shortcuts (/sat ISS, /comet atlas, /freq 145.8) that constrain which embedding stores are queried, reducing the search space.

The schema evolves through Alembic migrations, each building on the previous:

RevisionDescription
001Baseline: all core tables (satellite, celestial_object, target_frequency, observer, rotor, tracking, catalog)
002Add pgai vectorizer definitions for satellite, celestial_object, and target_frequency
003Add rocket_launch and reentry_prediction tables
004Enriched vectorizers with search_text columns; add rocket_launch embedding vectorizer
005Add link columns (webcast_url, info_url, ll2_url) to rocket_launch

The vectorizer definitions in migration 002/004 are SQL statements that register vectorizer configurations with the pgai extension. They tell the vectorizer worker which table and column to read, which embedding model to use, and where to store the results. The worker picks up these definitions automatically and begins processing.