Troubleshooting

API Returns 503 REVOCATION_UNAVAILABLE

Symptom:

{"code": "REVOCATION_UNAVAILABLE", "message": "Revocation check unavailable"}

Root cause: Redis is unreachable, and the AuthStrategy is configured with failOpen: false (the default, which is safe).

Solution:

Check Redis health:

kubectl get pod -n velocity-system -l app=redis

If Redis pod is not running, restart it:

kubectl rollout restart statefulset/redis -n velocity-system

Check Redis logs for errors:

kubectl logs -n velocity-system -l app=redis --tail=100

Verify network connectivity from API pod to Redis:

kubectl exec -n velocity-system deployment/velocity-api -- \
  redis-cli -h redis:6379 ping

Expected: PONG

If the issue persists, check your AuthStrategy configuration:
```
apiVersion: velocity.sh/v1
kind: AuthStrategy
metadata:
  name: jwt-internal
spec:
  revocation:
    failOpen: false  # This is correct (safe default)
```
To allow requests when Redis is down (dangerous), set failOpen: true. Only do this in non-production environments.

API Returns 401 Invalid Bearer Token

Symptom:

{"code": "AUTH_INVALID_TOKEN", "message": "Invalid or expired token"}

Root cause:

Token is expired
Token signature is invalid
Token issuer is not configured in AuthStrategy
JWKS endpoint returned a key that doesn’t match

Solution:

Check token expiration:

# Decode the JWT (base64 decode the middle section)
echo <token> | awk -F'.' '{print $2}' | base64 -d | jq .

Look for "exp" field. If exp < now(), the token is expired.

Verify AuthStrategy is configured for your issuer:
Terminal window
```
kubectl get authstrategy -A
kubectl describe authstrategy jwt-internal -n velocity-system
```
Check that spec.issuer matches your token’s iss claim.
Test token verification:
Terminal window
```
curl -H "Authorization: Bearer <token>" \
  https://api.velocity.acme.com/version
```
If it returns 401, the token is invalid. If it returns 200, auth is working.
Check JWKS cache:
Terminal window
```
kubectl logs -n velocity-system -l app=velocity-api | grep -i jwks
```
Should show periodic refreshes every 5 minutes. If you see errors, the JWKS endpoint may be down.

Reconciler Hot-loops / Operator CPU High

Symptom:

Operator pod CPU usage > 50% continuously

Logs show repeated reconciliation of the same CRD:

{"message": "reconciling", "schema": "acme/supply-chain/procurement/purchase-order/v1", "attempt": 1234}

Root cause: Schema has drifted from its spec. The operator reconciles, makes changes, detects a diff, reconciles again, etc.

Solution:

Check the operator logs for the error:

kubectl logs -n velocity-system -l app=velocity-operator --tail=200 | \
  grep -A5 "error"

Common causes:
- Postgres table has uncommitted changes (e.g., a manual ALTER TABLE by DBA)
- Operator role lacks permission to create/modify a resource
- A required RLS policy is missing
- Archive or history table is out of sync

Run the drift check:

velocity drift check \
  --schema acme/supply-chain/procurement/purchase-order/v1

This reports what is drifted.

Quarantine the schema to stop reconciliation:

kubectl annotate schemadefinition purchase-order \
  -n acme-supply-chain-procurement \
  velocity.sh/quarantine=true \
  --overwrite

Fix the drift manually (with DBA):

-- Example: add a missing RLS policy
CREATE POLICY acme_supply_chain_procurement_policy
ON acme_supply_chain_procurement.purchase_order_v1
FOR SELECT USING (...);

Remove the quarantine:

kubectl annotate schemadefinition purchase-order \
  -n acme-supply-chain-procurement \
  velocity.sh/quarantine- \
  --overwrite

Trigger a manual reconcile:

velocity reconcile --schema acme/supply-chain/procurement/purchase-order/v1

Outbox Table Grows Unbounded

Symptom:

SELECT COUNT(*) FROM acme_supply_chain_procurement.purchase_order_v1_outbox
WHERE published_at IS NULL;

Returns thousands of unpublished rows.

Root cause: CDC worker is not running or has crashed. Outbox rows are not being sent to Typesense.

Solution:

Check CDC worker health:

kubectl logs -n velocity-system -l app=velocity-api -c cdc-worker --tail=100

Look for panic, error, or connection issues.

Verify Typesense is reachable:

kubectl exec -n velocity-system deployment/velocity-api -- \
  curl https://typesense.example.com:8108/health

Should return status 200 with health info.

Check if the CDC worker is running:

kubectl exec -n velocity-system deployment/velocity-api -- \
  ps aux | grep cdc

Should show a running process.

Restart the API deployment:

kubectl rollout restart deployment/velocity-api -n velocity-system

Monitor the outbox shrinking:

watch -n 5 'psql -U velocity_api -d velocity -c "SELECT COUNT(*) FROM acme_supply_chain_procurement.purchase_order_v1_outbox WHERE published_at IS NULL;"'

The count should decrease as rows are processed.

If the count doesn’t shrink, check Typesense logs:
Terminal window
```
kubectl logs -n typesense -l app=typesense --tail=100
```
The collection may be read-only, or there may be a permission issue.

Schema Apply Succeeds but Table Not Created

Symptom:

kubectl apply -f schema.yaml
# Output: schemadefinition.velocity.sh/purchase-order created
# Status shows "Ready"

# But the table doesn't exist:
psql -U velocity_api -d velocity -c \
  "SELECT to_regclass('acme_supply_chain_procurement.purchase_order_v1');"

Returns NULL.

Root cause: Operator does not have permission to CREATE TABLE in the domain schema.

Solution:

Check operator logs:

kubectl logs -n velocity-system -l app=velocity-operator --tail=200 | \
  grep -i "permission\|denied"

Verify operator’s Postgres role has CREATE permission:

-- As superuser
SELECT * FROM information_schema.role_table_grants
WHERE grantee = 'velocity_operator' AND table_schema LIKE 'acme_%';

Should show CREATE in the privileges.

Grant the permission:

-- As superuser
GRANT CREATE ON SCHEMA acme_supply_chain_procurement TO velocity_operator;

Retry the apply:

velocity reconcile --schema acme/supply-chain/procurement/purchase-order/v1

Time Machine Shows Empty History

Symptom:

velocity history list --schema ... --id PO-00000001

Returns 0 events, even though records were created and updated.

Root cause: History trigger is not firing, or history table is missing.

Solution:

Check if the history table exists:

SELECT to_regclass('acme_supply_chain_procurement.purchase_order_v1_history');

If NULL, the operator did not create it. Reconcile the schema:

velocity reconcile --schema acme/supply-chain/procurement/purchase-order/v1

Check if the trigger exists:

SELECT trigger_name FROM information_schema.triggers
WHERE event_object_table = 'purchase_order_v1'
  AND trigger_schema = 'acme_supply_chain_procurement';

Should show a trigger like purchase_order_v1_history_trigger.

Check if the trigger is enabled:

SELECT tgenabled FROM pg_trigger
WHERE relname = 'purchase_order_v1'
  AND tgname = 'purchase_order_v1_history_trigger';

Should return O (enabled). If it returns D, the trigger is disabled:

-- Enable it
ALTER TABLE acme_supply_chain_procurement.purchase_order_v1 ENABLE ALWAYS TRIGGER purchase_order_v1_history_trigger;

Manually verify the trigger works by creating a test record:

velocity record create \
  --schema acme/supply-chain/procurement/purchase-order/v1 \
  --data '{"id": "TEST-001", "status": "draft"}'

Check the history table:

SELECT COUNT(*) FROM acme_supply_chain_procurement.purchase_order_v1_history
WHERE entity_id = 'TEST-001';

Should show 1 (the CREATE event).

Archive Worker Silently Skips Policy

Symptom:

ArchivePolicy is applied and shows Ready
Archive worker pod is running
No errors in logs
But records are not being archived to S3

Root cause:

Archive policy type is cel (deferred to Phase 10)
S3 destination is not configured
ARCHIVE_S3_BUCKET environment variable is not set

Solution:

Check the archive policy:

kubectl get archivepolicy -A
kubectl describe archivepolicy <name>

Look at spec.trigger.type. If it’s cel, this is deferred:

spec:
  trigger:
    type: cel  # Deferred; only 'age' and 'size' are implemented

Use age or size instead:

spec:
  trigger:
    type: age
    days: 30

Verify S3 bucket is configured:

kubectl get deployment velocity-archive-worker -n velocity-system -o yaml | \
  grep -A5 "ARCHIVE_S3"

If not set, update the Helm values:

archiveWorker:
  env:
    ARCHIVE_S3_BUCKET: velocity-archives
    ARCHIVE_S3_REGION: us-east-1

Reapply Helm:

helm upgrade velocity velocity/velocity \
  --namespace velocity-system \
  --values values.yaml

Check archive run status:

SELECT schema_path, status, started_at, completed_at FROM platform.archive_runs
ORDER BY started_at DESC LIMIT 10;

Should show recent runs with status success.

velocity context add Fails with “Invalid Bearer Token”

Symptom:

velocity context add \
  --name prod \
  --api-url https://api.velocity.acme.com \
  --bearer-token eyJhbGc...

Error:

Error: invalid bearer token (contains CRLF)

Root cause: The token contains newlines (common when copying from email or docs).

Solution:

Remove newlines from the token:

TOKEN=$(cat token.txt | tr -d '\n\r')
velocity context add --name prod --api-url ... --bearer-token "$TOKEN"

Verify the token is valid:

echo "$TOKEN" | awk -F'.' '{if(NF!=3) exit 1}' && echo "Valid JWT structure"

Retry the context add.

velocity api-key create Times Out Waiting for Secret

Symptom:

velocity api-key create --name prod --ttl 30d

Hangs for 5+ minutes, then times out:

Error: timeout waiting for secret to appear

Root cause: Operator is not running, or API server cannot create Kubernetes secrets.

Solution:

Check if operator is running:
Terminal window
```
kubectl get deployment velocity-operator -n velocity-system
```
Should show 1 replica.

Check operator logs:

kubectl logs -n velocity-system -l app=velocity-operator --tail=100

Look for FATAL or ERROR.

Check API server RBAC:
Terminal window
```
kubectl describe rolebinding velocity-api -n velocity-system
```
Should show permission to create secrets.

If RBAC is missing, add it:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: velocity-api
  namespace: velocity-system
rules:
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]

Retry with a longer timeout:

velocity api-key create \
  --name prod \
  --ttl 30d \
  --wait-secs 120

API Server Crashes with “RLS will not work”

Symptom:

API server logs show:

FATAL velocity_api role has BYPASSRLS=true — RLS will not work. Fix the role.

Then the pod restarts continuously.

Root cause: The database role velocity_api has BYPASSRLS=true (allowing it to bypass RLS).

Solution:

Connect to Postgres as superuser:

psql -h <pg-host> -U <superuser> -d velocity

Fix the role:
```
ALTER ROLE velocity_api NOBYPASSRLS;
```

Verify:

SELECT rolname, rolbypassrls FROM pg_roles WHERE rolname = 'velocity_api';

Should show rolbypassrls = f.

Restart the API deployment:

kubectl rollout restart deployment/velocity-api -n velocity-system

Tail logs to verify startup:
Terminal window
```
kubectl logs -f -n velocity-system -l app=velocity-api
```
Should show “schema informer ready” when healthy.

Validating Webhook Rejects Valid Manifest

Symptom:

Applying a manifest fails with:

Error from server: error when creating "schema.yaml": admission webhook "schemadefinition.velocity.sh" denied the request: ...

But the manifest looks correct.

Solution:

Get more details:

kubectl apply -f schema.yaml -v=6 2>&1 | grep -A10 "denied"

Common webhook rejections:

Namespace mismatch: {org}-{app}-{domain} namespace must match CRD metadata

# Must be in namespace acme-supply-chain-procurement
metadata:
  namespace: acme-supply-chain-procurement
  labels:
    velocity.sh/org: acme
    velocity.sh/app: supply-chain
    velocity.sh/domain: procurement

CEL syntax error: Fix the rule

validation:
  rules:
    - rule: "self.amount >"  # Missing right side

Quota exceeded: Too many SchemaDefinitions in the Application
```
SELECT COUNT(*) FROM platform.schema_definitions
WHERE org = 'acme' AND app = 'supply-chain';
```
If you need more, update the quota in the Application CRD.
Cross-org reference (multi-tenant): In multi-tenant mode, you cannot reference a schema from another org
```
# This is forbidden in multi-tenant:
spec:
  refs:
    - path: other-org/app/domain/schema/v1
```

Check webhook configuration:

kubectl get validatingwebhookconfigurations | grep velocity
kubectl describe validatingwebhookconfigurations velocity.sh-schemadefinition

If the webhook is misconfigured, delete and restart:

kubectl rollout restart deployment/velocity-webhook -n velocity-system

Next Steps

If you can’t find your issue here, check:

Security — Auth and RLS-related issues
Hardening — CEL, input validation
API Reference — REST endpoint issues
Operator logs: kubectl logs -f -n velocity-system -l app=velocity-operator
API logs: kubectl logs -f -n velocity-system -l app=velocity-api

Or reach out to the team.