Database Schema
Craft stores all its data in a single PostgreSQL 17 database. The schema is designed around two concerns: efficiently storing and querying orbital and celestial data, and feeding that data into a vector embedding pipeline for semantic search.
Database image and extensions
Section titled “Database image and extensions”The database runs on the timescale/timescaledb-ha:pg17 image, which bundles several extensions beyond what stock PostgreSQL provides:
- pgvector — adds the
vectorcolumn type and distance operators (cosine, L2, inner product) used for embedding similarity search - pgai — the higher-level extension that manages vectorizer definitions, tracks which rows need re-embedding, and coordinates the vectorizer worker
The pgai extension is not pre-installed in the container image. A one-shot init container (pgai-install) runs at startup, connects to the database, and installs or upgrades the extension to the version bundled with the vectorizer worker image. This ensures the extension version always matches the worker version, avoiding protocol mismatches.
Core tables
Section titled “Core tables”satellite
Section titled “satellite”The central table. Each row represents a tracked satellite with its current Two-Line Element set.
| Column | Type | Purpose |
|---|---|---|
norad_id | integer (PK) | NORAD catalog number — the universal satellite identifier |
name | varchar(100) | Common name (e.g., “ISS (ZARYA)“) |
intl_designator | varchar(20) | International designator (e.g., “1998-067A”) |
tle_line1, tle_line2 | varchar(70) | The two TLE lines, stored verbatim |
tle_epoch | timestamptz | When the TLE was generated |
std_mag | float | Standard visual magnitude (intrinsic brightness) |
rcs | float | Radar cross-section in square meters |
orbit_type | varchar(20) | LEO, MEO, GEO, HEO, etc. |
object_type | varchar(20) | PAYLOAD, ROCKET BODY, DEBRIS, etc. |
search_text | text | Denormalized blob for the vectorizer (see below) |
is_active | boolean | Whether the satellite has decayed |
updated_at | timestamptz | Last TLE update time |
The NORAD ID is used as the primary key rather than a synthetic auto-increment because it is the globally unique identifier that every external system (CelesTrak, Space-Track, satellite.js) uses to reference a satellite. There is no ambiguity to resolve.
satellite_group
Section titled “satellite_group”A join table linking satellites to named groups (amateur, visual, stations, weather, starlink, and debris categories). A satellite can belong to multiple groups. Groups are populated automatically during TLE ingestion based on which CelesTrak group file the satellite was fetched from.
celestial_object
Section titled “celestial_object”A general-purpose table for everything that is not a satellite: stars, planets, deep sky objects, comets, meteor showers, debris with known RA/Dec, radio sources, and custom targets.
| Column | Type | Purpose |
|---|---|---|
id | integer (PK) | Auto-increment identifier |
name | varchar(100) | Display name |
object_type | varchar(20) | star, planet, dso, comet, meteor_shower, debris, custom, radio_source |
ra_hours, dec_degrees | float | J2000 coordinates (fixed objects) |
magnitude | float | Visual magnitude |
catalog_id | varchar(30) | Standard catalog identifier (HIP 91262, M31, NGC 7331) |
constellation | varchar(5) | IAU constellation abbreviation |
description | text | Free-text description |
search_text | text | Denormalized blob for the vectorizer |
For comets, additional columns store Keplerian orbital elements:
| Column | Purpose |
|---|---|
epoch_jd | Epoch as Julian Date |
perihelion_au | Perihelion distance in AU |
eccentricity | Orbital eccentricity |
inclination_deg | Orbital inclination |
arg_perihelion_deg | Argument of perihelion |
lon_ascending_deg | Longitude of ascending node |
perihelion_year/month/day | Perihelion passage time (year, month, fractional day) |
magnitude_g, magnitude_k | Absolute magnitude and slope parameter |
designation | MPC designation (unique, used for deduplication) |
This dual-purpose design avoids having separate tables for stars, DSOs, and comets. The object_type column discriminates, and the orbital element columns are simply null for fixed-position objects. The SkyEngine checks whether orbital elements are present and selects the appropriate computation path accordingly.
target_frequency
Section titled “target_frequency”Radio frequencies associated with any target type.
| Column | Type | Purpose |
|---|---|---|
id | integer (PK) | Auto-increment |
target_type | varchar(20) | Which table the target lives in |
target_id | varchar(30) | The target’s primary key (as a string) |
description | varchar(100) | ”uplink”, “downlink”, “beacon”, etc. |
frequency_mhz | float | Frequency in MHz |
modulation | varchar(100) | FM, SSB, CW, AFSK, etc. |
bandwidth_khz | float | Signal bandwidth |
rocket_launch
Section titled “rocket_launch”Upcoming and recent rocket launches, sourced from Launch Library 2.
| Column | Type | Purpose |
|---|---|---|
id | integer (PK) | Auto-increment |
ll2_id | varchar(50) | Launch Library 2 UUID (unique, for deduplication) |
name | varchar(200) | Full launch name (e.g., “Falcon 9 Block 5 | Starlink Group 6-70”) |
net | timestamptz | No Earlier Than — the scheduled launch time |
window_start/end | timestamptz | Launch window boundaries |
provider_name | varchar(100) | SpaceX, Rocket Lab, etc. |
rocket_name | varchar(100) | Vehicle name |
pad_name | varchar(200) | Launch site |
pad_lat/lon | float | Launch site coordinates |
mission_* | various | Mission name, type, orbit, and description |
image_url, webcast_url, info_url | text | Media links |
is_upcoming | boolean | Whether the launch is in the future |
search_text | text | Denormalized blob for the vectorizer |
reentry_prediction
Section titled “reentry_prediction”Predicted and confirmed reentry events, sourced from Space-Track TIP messages.
Additional tables
Section titled “Additional tables”- observer_location — the ground station position (latitude, longitude, altitude), used for all topocentric calculations
- rotor_config — rotctld connection parameters (host, port, min elevation, park position) for each configured rotor
- tracking_session — audit log of rotor tracking sessions (start/end times, target, notes)
- catalog_source — tracks the last update time and status for each TLE source group
Embedding stores
Section titled “Embedding stores”Each table with a search_text column has a corresponding embedding store table, created automatically by the pgai vectorizer system:
| Source table | Embedding table | Rows (typical) |
|---|---|---|
satellite | satellite_embedding_store | ~22,000 |
celestial_object | celestial_object_embedding_store | ~500 |
target_frequency | target_frequency_embedding_store | ~300 |
rocket_launch | rocket_launch_embedding_store | ~200 |
Each embedding store contains a foreign key back to the source row and a vector(1024) column holding the embedding generated by the mxbai-embed-large model. The vectorizer worker maintains these tables automatically — when a source row’s search_text changes, the worker detects it and regenerates the embedding.
The search_text pattern
Section titled “The search_text pattern”Rather than embedding raw column values, each table has a search_text column that assembles a richer representation of the object. For a satellite, this includes:
ISS (ZARYA) 1998-067A (PAYLOAD) groups: stations, visual, amateurfrequencies: 145.8 MHz FM voice downlink, 437.8 MHz APRSThis enriched text means the embedding captures semantic relationships that no single column carries. A search for “amateur radio in space” finds the ISS because “amateur” appears in the group name and “radio” appears in the frequency context, even though neither word is in the satellite’s name or object type.
The build_satellite_search_text, build_launch_search_text, and build_celestial_search_text functions in the search_text service module construct these strings. They query related tables (groups, frequencies) and concatenate everything into a single text blob. This denormalization is the tradeoff: the search_text column duplicates information that exists in normalized form elsewhere, but it gives the vectorizer exactly the context it needs to produce useful embeddings.
How omnisearch uses the schema
Section titled “How omnisearch uses the schema”The omnisearch endpoint (/api/search) runs a hybrid query:
-
Semantic path: The query string is embedded using the same mxbai-embed-large model. The resulting vector is compared against each embedding store using cosine distance. Results are ranked by similarity.
-
Text path: A traditional ILIKE search runs against name columns (and catalog_id for celestial objects). Results get a flat score of 0.5.
-
Merge: Results from both paths are deduplicated by
(target_type, target_id). Objects found by both methods receive a 0.1 score boost and are marked as source “both.”
The search also supports type shortcuts (/sat ISS, /comet atlas, /freq 145.8) that constrain which embedding stores are queried, reducing the search space.
Migration chain
Section titled “Migration chain”The schema evolves through Alembic migrations, each building on the previous:
| Revision | Description |
|---|---|
001 | Baseline: all core tables (satellite, celestial_object, target_frequency, observer, rotor, tracking, catalog) |
002 | Add pgai vectorizer definitions for satellite, celestial_object, and target_frequency |
003 | Add rocket_launch and reentry_prediction tables |
004 | Enriched vectorizers with search_text columns; add rocket_launch embedding vectorizer |
005 | Add link columns (webcast_url, info_url, ll2_url) to rocket_launch |
The vectorizer definitions in migration 002/004 are SQL statements that register vectorizer configurations with the pgai extension. They tell the vectorizer worker which table and column to read, which embedding model to use, and where to store the results. The worker picks up these definitions automatically and begins processing.