Database Schema

Craft stores all its data in a single PostgreSQL 17 database. The schema is designed around two concerns: efficiently storing and querying orbital and celestial data, and feeding that data into a vector embedding pipeline for semantic search.

Database image and extensions

The database runs on the timescale/timescaledb-ha:pg17 image, which bundles several extensions beyond what stock PostgreSQL provides:

pgvector — adds the vector column type and distance operators (cosine, L2, inner product) used for embedding similarity search
pgai — the higher-level extension that manages vectorizer definitions, tracks which rows need re-embedding, and coordinates the vectorizer worker

The pgai extension is not pre-installed in the container image. A one-shot init container (pgai-install) runs at startup, connects to the database, and installs or upgrades the extension to the version bundled with the vectorizer worker image. This ensures the extension version always matches the worker version, avoiding protocol mismatches.

Core tables

satellite

The central table. Each row represents a tracked satellite with its current Two-Line Element set.

Column	Type	Purpose
`norad_id`	`integer` (PK)	NORAD catalog number — the universal satellite identifier
`name`	`varchar(100)`	Common name (e.g., “ISS (ZARYA)“)
`intl_designator`	`varchar(20)`	International designator (e.g., “1998-067A”)
`tle_line1`, `tle_line2`	`varchar(70)`	The two TLE lines, stored verbatim
`tle_epoch`	`timestamptz`	When the TLE was generated
`std_mag`	`float`	Standard visual magnitude (intrinsic brightness)
`rcs`	`float`	Radar cross-section in square meters
`orbit_type`	`varchar(20)`	LEO, MEO, GEO, HEO, etc.
`object_type`	`varchar(20)`	PAYLOAD, ROCKET BODY, DEBRIS, etc.
`search_text`	`text`	Denormalized blob for the vectorizer (see below)
`is_active`	`boolean`	Whether the satellite has decayed
`updated_at`	`timestamptz`	Last TLE update time

The NORAD ID is used as the primary key rather than a synthetic auto-increment because it is the globally unique identifier that every external system (CelesTrak, Space-Track, satellite.js) uses to reference a satellite. There is no ambiguity to resolve.

satellite_group

A join table linking satellites to named groups (amateur, visual, stations, weather, starlink, and debris categories). A satellite can belong to multiple groups. Groups are populated automatically during TLE ingestion based on which CelesTrak group file the satellite was fetched from.

celestial_object

A general-purpose table for everything that is not a satellite: stars, planets, deep sky objects, comets, meteor showers, debris with known RA/Dec, radio sources, and custom targets.

Column	Type	Purpose
`id`	`integer` (PK)	Auto-increment identifier
`name`	`varchar(100)`	Display name
`object_type`	`varchar(20)`	star, planet, dso, comet, meteor_shower, debris, custom, radio_source
`ra_hours`, `dec_degrees`	`float`	J2000 coordinates (fixed objects)
`magnitude`	`float`	Visual magnitude
`catalog_id`	`varchar(30)`	Standard catalog identifier (HIP 91262, M31, NGC 7331)
`constellation`	`varchar(5)`	IAU constellation abbreviation
`description`	`text`	Free-text description
`search_text`	`text`	Denormalized blob for the vectorizer

For comets, additional columns store Keplerian orbital elements:

Column	Purpose
`epoch_jd`	Epoch as Julian Date
`perihelion_au`	Perihelion distance in AU
`eccentricity`	Orbital eccentricity
`inclination_deg`	Orbital inclination
`arg_perihelion_deg`	Argument of perihelion
`lon_ascending_deg`	Longitude of ascending node
`perihelion_year/month/day`	Perihelion passage time (year, month, fractional day)
`magnitude_g`, `magnitude_k`	Absolute magnitude and slope parameter
`designation`	MPC designation (unique, used for deduplication)

This dual-purpose design avoids having separate tables for stars, DSOs, and comets. The object_type column discriminates, and the orbital element columns are simply null for fixed-position objects. The SkyEngine checks whether orbital elements are present and selects the appropriate computation path accordingly.

target_frequency

Radio frequencies associated with any target type.

Column	Type	Purpose
`id`	`integer` (PK)	Auto-increment
`target_type`	`varchar(20)`	Which table the target lives in
`target_id`	`varchar(30)`	The target’s primary key (as a string)
`description`	`varchar(100)`	”uplink”, “downlink”, “beacon”, etc.
`frequency_mhz`	`float`	Frequency in MHz
`modulation`	`varchar(100)`	FM, SSB, CW, AFSK, etc.
`bandwidth_khz`	`float`	Signal bandwidth

rocket_launch

Upcoming and recent rocket launches, sourced from Launch Library 2.

Column	Type	Purpose
`id`	`integer` (PK)	Auto-increment
`ll2_id`	`varchar(50)`	Launch Library 2 UUID (unique, for deduplication)
`name`	`varchar(200)`	Full launch name (e.g., “Falcon 9 Block 5 \| Starlink Group 6-70”)
`net`	`timestamptz`	No Earlier Than — the scheduled launch time
`window_start/end`	`timestamptz`	Launch window boundaries
`provider_name`	`varchar(100)`	SpaceX, Rocket Lab, etc.
`rocket_name`	`varchar(100)`	Vehicle name
`pad_name`	`varchar(200)`	Launch site
`pad_lat/lon`	`float`	Launch site coordinates
`mission_*`	various	Mission name, type, orbit, and description
`image_url`, `webcast_url`, `info_url`	`text`	Media links
`is_upcoming`	`boolean`	Whether the launch is in the future
`search_text`	`text`	Denormalized blob for the vectorizer

reentry_prediction

Predicted and confirmed reentry events, sourced from Space-Track TIP messages.

Additional tables

observer_location — the ground station position (latitude, longitude, altitude), used for all topocentric calculations
rotor_config — rotctld connection parameters (host, port, min elevation, park position) for each configured rotor
tracking_session — audit log of rotor tracking sessions (start/end times, target, notes)
catalog_source — tracks the last update time and status for each TLE source group

Embedding stores

Each table with a search_text column has a corresponding embedding store table, created automatically by the pgai vectorizer system:

Source table	Embedding table	Rows (typical)
`satellite`	`satellite_embedding_store`	~22,000
`celestial_object`	`celestial_object_embedding_store`	~500
`target_frequency`	`target_frequency_embedding_store`	~300
`rocket_launch`	`rocket_launch_embedding_store`	~200

Each embedding store contains a foreign key back to the source row and a vector(1024) column holding the embedding generated by the mxbai-embed-large model. The vectorizer worker maintains these tables automatically — when a source row’s search_text changes, the worker detects it and regenerates the embedding.

The search_text pattern

Rather than embedding raw column values, each table has a search_text column that assembles a richer representation of the object. For a satellite, this includes:

ISS (ZARYA) 1998-067A (PAYLOAD) groups: stations, visual, amateur
frequencies: 145.8 MHz FM voice downlink, 437.8 MHz APRS

This enriched text means the embedding captures semantic relationships that no single column carries. A search for “amateur radio in space” finds the ISS because “amateur” appears in the group name and “radio” appears in the frequency context, even though neither word is in the satellite’s name or object type.

The build_satellite_search_text, build_launch_search_text, and build_celestial_search_text functions in the search_text service module construct these strings. They query related tables (groups, frequencies) and concatenate everything into a single text blob. This denormalization is the tradeoff: the search_text column duplicates information that exists in normalized form elsewhere, but it gives the vectorizer exactly the context it needs to produce useful embeddings.

How omnisearch uses the schema

The omnisearch endpoint (/api/search) runs a hybrid query:

Semantic path: The query string is embedded using the same mxbai-embed-large model. The resulting vector is compared against each embedding store using cosine distance. Results are ranked by similarity.
Text path: A traditional ILIKE search runs against name columns (and catalog_id for celestial objects). Results get a flat score of 0.5.
Merge: Results from both paths are deduplicated by (target_type, target_id). Objects found by both methods receive a 0.1 score boost and are marked as source “both.”

The search also supports type shortcuts (/sat ISS, /comet atlas, /freq 145.8) that constrain which embedding stores are queried, reducing the search space.

Migration chain

The schema evolves through Alembic migrations, each building on the previous:

Revision	Description
`001`	Baseline: all core tables (satellite, celestial_object, target_frequency, observer, rotor, tracking, catalog)
`002`	Add pgai vectorizer definitions for satellite, celestial_object, and target_frequency
`003`	Add rocket_launch and reentry_prediction tables
`004`	Enriched vectorizers with search_text columns; add rocket_launch embedding vectorizer
`005`	Add link columns (webcast_url, info_url, ll2_url) to rocket_launch

The vectorizer definitions in migration 002/004 are SQL statements that register vectorizer configurations with the pgai extension. They tell the vectorizer worker which table and column to read, which embedding model to use, and where to store the results. The worker picks up these definitions automatically and begins processing.