Skip to content

Database schema

The platform schema owns every cross-tenant table Velocity needs: CRD mirrors, audit chain, event log, API keys, role bindings, sessions, idempotency cache, archive runs, purge requests, and the work queues that back background sweepers. Per-tenant record data lives in {org}_{app}_{domain}.{object}_{version} schemas provisioned by the hierarchy operator — those tables are not described here.

All platform tables are created by migration 0001. The DDL below is lifted verbatim — what ships in the repo is what runs in your cluster.

Conventions

  • Idempotent DDL. Every CREATE TABLE and CREATE INDEX uses IF NOT EXISTS; every DO $$ ... $$ block guards inner work with IF NOT EXISTS. Re-applying the migrations is a no-op when state is current.
  • Partial unique indexes over WHERE deleted_at IS NULL so soft-deleted rows can re-use the unique value. (Pattern is used in tenant tables; the platform.api_keys table applies the same idea over WHERE revoked_at IS NULL.)
  • UUIDv4 primary keys for entity tables — gen_random_uuid() from pgcrypto, which the migration enables.
  • TIMESTAMPTZ everywhere. No naked TIMESTAMP columns. Insert/update timestamps default to now().
  • JSONB, not JSON. We rely on binary representation for @> containment queries and key deduplication.

CRD mirror tables

The kube informer is the source of truth for CRDs. These tables exist so operators can ad-hoc query “what schemas exist” without round-tripping to the apiserver.

platform.schema_definitions

One row per SchemaDefinition CRD. Composite primary key is the natural path (org, app, domain, object, version).

CREATE TABLE IF NOT EXISTS platform.schema_definitions (
org TEXT NOT NULL,
app TEXT NOT NULL,
domain TEXT NOT NULL,
object TEXT NOT NULL,
version TEXT NOT NULL,
namespace TEXT NOT NULL,
name TEXT NOT NULL,
pg_schema TEXT NOT NULL,
pg_table TEXT NOT NULL,
lifecycle TEXT NOT NULL DEFAULT 'stable',
policy_hash TEXT,
spec JSONB NOT NULL,
status JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (org, app, domain, object, version)
);
CREATE INDEX IF NOT EXISTS idx_schema_definitions_namespace
ON platform.schema_definitions (namespace, name);
CREATE INDEX IF NOT EXISTS idx_schema_definitions_pg
ON platform.schema_definitions (pg_schema, pg_table);

platform.field_definitions

Flat field index, FK’d to the parent schema with ON DELETE CASCADE so a CRD removal cleans up its rows.

CREATE TABLE IF NOT EXISTS platform.field_definitions (
org TEXT NOT NULL,
app TEXT NOT NULL,
domain TEXT NOT NULL,
object TEXT NOT NULL,
version TEXT NOT NULL,
name TEXT NOT NULL,
kind TEXT NOT NULL,
required BOOLEAN NOT NULL DEFAULT false,
unique_field BOOLEAN NOT NULL DEFAULT false,
indexed BOOLEAN NOT NULL DEFAULT false,
searchable BOOLEAN NOT NULL DEFAULT false,
sensitivity TEXT,
spec JSONB NOT NULL,
PRIMARY KEY (org, app, domain, object, version, name),
FOREIGN KEY (org, app, domain, object, version)
REFERENCES platform.schema_definitions(org, app, domain, object, version)
ON DELETE CASCADE
);

Event log (time machine)

platform.event_log is the time-machine hot tier (ADR-004). Every CREATE / UPDATE / DELETE / RESTORE goes here. UPDATEs carry a JSON Patch in diff; CREATEs carry the full record in payload; DELETEs carry NULL.

The table is monthly range-partitioned by occurred_at. The migration bootstraps the current month and the next month so writes never fail before the operator’s partition manager runs.

CREATE TABLE IF NOT EXISTS platform.event_log (
id UUID NOT NULL DEFAULT gen_random_uuid(),
occurred_at TIMESTAMPTZ NOT NULL DEFAULT now(),
schema_org TEXT NOT NULL, -- "{org}/{app}/{domain}/{object}/{version}"
entity_id UUID,
operation TEXT NOT NULL, -- create|update|delete|restore|...
actor TEXT NOT NULL,
source TEXT NOT NULL DEFAULT 'api', -- api|operator-sync|import|migration
request_id TEXT,
diff JSONB, -- JSON Patch for UPDATE
payload JSONB, -- full record for CREATE; NULL for DELETE
reason TEXT, -- free-text from restore (added in 0004)
PRIMARY KEY (id, occurred_at)
) PARTITION BY RANGE (occurred_at);

Bootstrap partitions:

DO $$
DECLARE
cur_start DATE := date_trunc('month', now())::date;
cur_end DATE := (date_trunc('month', now()) + interval '1 month')::date;
nxt_start DATE := cur_end;
nxt_end DATE := (date_trunc('month', now()) + interval '2 months')::date;
cur_part TEXT := format('event_log_%s', to_char(cur_start, 'YYYY_MM'));
nxt_part TEXT := format('event_log_%s', to_char(nxt_start, 'YYYY_MM'));
BEGIN
IF NOT EXISTS (SELECT 1 FROM pg_class WHERE relname = cur_part) THEN
EXECUTE format(
'CREATE TABLE platform.%I PARTITION OF platform.event_log FOR VALUES FROM (%L) TO (%L)',
cur_part, cur_start, cur_end
);
END IF;
IF NOT EXISTS (SELECT 1 FROM pg_class WHERE relname = nxt_part) THEN
EXECUTE format(
'CREATE TABLE platform.%I PARTITION OF platform.event_log FOR VALUES FROM (%L) TO (%L)',
nxt_part, nxt_start, nxt_end
);
END IF;
END
$$;

Indexes target the three query shapes the time-machine and audit replay actually issue:

CREATE INDEX IF NOT EXISTS idx_event_log_entity
ON platform.event_log (entity_id, occurred_at DESC);
CREATE INDEX IF NOT EXISTS idx_event_log_schema_time
ON platform.event_log (schema_org, occurred_at DESC);
CREATE INDEX IF NOT EXISTS idx_event_log_actor_time
ON platform.event_log (actor, occurred_at DESC);

There is deliberately no index on reason — it’s a read-back-only field; adding an index would just cost write bandwidth.

Audit chain (ADR-005)

Two tables collaborate: audit_log is the append-only record, audit_chain_state is the singleton row whose UPDATE serialises chain construction so concurrent writers can’t race.

platform.audit_log

CREATE TABLE IF NOT EXISTS platform.audit_log (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
occurred_at TIMESTAMPTZ NOT NULL DEFAULT now(),
actor TEXT NOT NULL,
action TEXT NOT NULL, -- create|read|update|delete|export|restore|...
outcome TEXT NOT NULL, -- success|error|denied|...
schema_org TEXT,
entity_id UUID,
payload JSONB,
prev_hash TEXT,
hash TEXT NOT NULL,
fail_modes JSONB, -- ADR-003 fail-mode audit
request_id TEXT,
reason TEXT,
ticket_ref TEXT
);

Direct INSERT, UPDATE, DELETE, TRUNCATE are revoked from PUBLIC in migration 0003; the only way to write is platform.audit_insert() (see stored procedures).

Indexes cover actor, schema, entity, and outcome timelines — the four facets of every audit lookup.

CREATE INDEX IF NOT EXISTS idx_audit_log_actor_time ON platform.audit_log (actor, occurred_at DESC);
CREATE INDEX IF NOT EXISTS idx_audit_log_schema_time ON platform.audit_log (schema_org, occurred_at DESC);
CREATE INDEX IF NOT EXISTS idx_audit_log_entity_time ON platform.audit_log (entity_id, occurred_at DESC);
CREATE INDEX IF NOT EXISTS idx_audit_log_outcome ON platform.audit_log (outcome, occurred_at DESC);

platform.audit_chain_state

Singleton row keyed on id = 1 with a CHECK constraint that forbids any other row. The audit_insert() proc updates this row to acquire the chain lock.

CREATE TABLE IF NOT EXISTS platform.audit_chain_state (
id INTEGER PRIMARY KEY DEFAULT 1,
last_hash TEXT,
CONSTRAINT audit_chain_state_singleton CHECK (id = 1)
);
INSERT INTO platform.audit_chain_state (id, last_hash)
VALUES (1, NULL)
ON CONFLICT (id) DO NOTHING;

API keys

API key plaintext is never stored — only the SHA-256 hash. The idx_api_keys_hash_active partial unique index keeps lookups O(1) and makes “revoked + re-issued with same plaintext” naturally distinct (the unique only applies to active rows).

CREATE TABLE IF NOT EXISTS platform.api_keys (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
namespace TEXT NOT NULL,
actor TEXT NOT NULL,
actor_type TEXT NOT NULL,
key_hash TEXT NOT NULL, -- sha256 hex; plaintext never stored
scopes JSONB NOT NULL DEFAULT '[]'::jsonb,
ip_allowlist JSONB NOT NULL DEFAULT '[]'::jsonb,
expires_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
revoked_at TIMESTAMPTZ
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_api_keys_hash_active
ON platform.api_keys (key_hash)
WHERE revoked_at IS NULL;
CREATE INDEX IF NOT EXISTS idx_api_keys_actor
ON platform.api_keys (actor)
WHERE revoked_at IS NULL;

Role bindings

One row per RoleBinding CRD. The unique index on (namespace, name) matches the k8s identity; the partial index on actor_id keeps the per-actor lookup hot.

CREATE TABLE IF NOT EXISTS platform.role_bindings (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
namespace TEXT NOT NULL,
actor_id TEXT NOT NULL,
roles JSONB NOT NULL,
scopes JSONB NOT NULL DEFAULT '[]'::jsonb,
granted_by TEXT,
expires_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
revoked_at TIMESTAMPTZ
);
CREATE INDEX IF NOT EXISTS idx_role_bindings_actor
ON platform.role_bindings (actor_id) WHERE revoked_at IS NULL;
CREATE UNIQUE INDEX IF NOT EXISTS idx_role_bindings_name
ON platform.role_bindings (namespace, name);

Sessions

OIDC session state — the refresh token is encrypted at-rest by the app before insertion, not by Postgres.

CREATE TABLE IF NOT EXISTS platform.sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
actor_id TEXT NOT NULL,
issuer TEXT NOT NULL,
refresh_token TEXT,
id_token_claims JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
expires_at TIMESTAMPTZ NOT NULL,
revoked_at TIMESTAMPTZ
);
CREATE INDEX IF NOT EXISTS idx_sessions_actor
ON platform.sessions (actor_id) WHERE revoked_at IS NULL;
CREATE INDEX IF NOT EXISTS idx_sessions_expiry
ON platform.sessions (expires_at) WHERE revoked_at IS NULL;

Idempotency keys

24-hour TTL. The (key, request_hash) pair detects “same key, different body” and returns 409. The created_at index supports the GC sweeper.

CREATE TABLE IF NOT EXISTS platform.idempotency_keys (
key TEXT PRIMARY KEY,
request_hash TEXT NOT NULL,
response_body JSONB,
response_code INTEGER NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_idempotency_keys_created
ON platform.idempotency_keys (created_at);

Archive runs & purge requests

Operational state for the archive worker. archive_runs is the historical log; purge_requests is the queue the operator walks to perform two-person-rule deletes.

CREATE TABLE IF NOT EXISTS platform.archive_runs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
policy_name TEXT NOT NULL,
policy_namespace TEXT NOT NULL,
schema_org TEXT NOT NULL,
started_at TIMESTAMPTZ NOT NULL DEFAULT now(),
finished_at TIMESTAMPTZ,
records BIGINT,
bytes BIGINT,
destination TEXT,
outcome TEXT NOT NULL DEFAULT 'running',
error TEXT
);
CREATE INDEX IF NOT EXISTS idx_archive_runs_schema
ON platform.archive_runs (schema_org, started_at DESC);
CREATE TABLE IF NOT EXISTS platform.purge_requests (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
schema_org TEXT NOT NULL,
older_than TIMESTAMPTZ NOT NULL,
estimated_records BIGINT,
requested_at TIMESTAMPTZ NOT NULL DEFAULT now(),
approved_at TIMESTAMPTZ,
approved_by TEXT,
purged_at TIMESTAMPTZ,
purged_records BIGINT,
reason TEXT
);
CREATE INDEX IF NOT EXISTS idx_purge_requests_pending
ON platform.purge_requests (requested_at DESC)
WHERE purged_at IS NULL;

Typesense reap queue

A durable work queue for the blue-green search collection reap. After an alias flip the operator must drop the old concrete collection — but only after a grace period that lets in-flight queries finish. The original implementation used tokio::time::sleep on a detached task; an operator restart during the grace window leaked the old concrete forever. This table makes the work crash-safe.

CREATE TABLE IF NOT EXISTS platform.pending_typesense_reaps (
id BIGSERIAL PRIMARY KEY,
concrete_name TEXT NOT NULL UNIQUE,
alias_name TEXT NOT NULL,
schema_uid TEXT NOT NULL,
enqueued_at TIMESTAMPTZ NOT NULL DEFAULT now(),
reap_after TIMESTAMPTZ NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_pending_typesense_reaps_due
ON platform.pending_typesense_reaps (reap_after);

The sweeper claims due rows via SELECT ... WHERE reap_after <= now() FOR UPDATE SKIP LOCKED, so multiple replicas don’t double-claim.

Anomaly detection

Two tables collaborate. anomaly_scan_state is a high-watermark singleton; the scanner only inspects rows with (occurred_at, id) > (cursor_ts, cursor_id), which gives strict total order even when two audit rows land in the same microsecond. anomaly_alerts is the detection store, with hourly dedupe via a partial unique index.

CREATE TABLE IF NOT EXISTS platform.anomaly_scan_state (
id INTEGER PRIMARY KEY DEFAULT 1,
last_scanned_occurred_at TIMESTAMPTZ,
last_scanned_id UUID,
last_scanned_at TIMESTAMPTZ,
CONSTRAINT anomaly_scan_state_singleton CHECK (id = 1)
);
CREATE TABLE IF NOT EXISTS platform.anomaly_alerts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
detected_at TIMESTAMPTZ NOT NULL DEFAULT now(),
rule TEXT NOT NULL, -- bulk_reader|after_hours|repeated_denials
actor TEXT,
schema_org TEXT,
severity TEXT NOT NULL DEFAULT 'warning',
detail JSONB NOT NULL,
window_start TIMESTAMPTZ NOT NULL,
window_end TIMESTAMPTZ NOT NULL,
delivered BOOLEAN NOT NULL DEFAULT false,
delivered_at TIMESTAMPTZ
);
-- Hourly dedupe: same (rule, actor, schema) within the same UTC hour collapses.
CREATE UNIQUE INDEX IF NOT EXISTS uniq_anomaly_alerts_dedupe
ON platform.anomaly_alerts (
rule,
COALESCE(actor, ''),
COALESCE(schema_org, ''),
date_trunc('hour', (detected_at AT TIME ZONE 'UTC'))
);

Why the AT TIME ZONE 'UTC' cast in the unique index? date_trunc('hour', TIMESTAMPTZ) is only STABLE — the result depends on the session’s TimeZone GUC, which means Postgres won’t allow it in a unique index. The cast to a plain TIMESTAMP makes the expression IMMUTABLE, which is the index requirement.