Restore from Backup
Use this runbook when you need to restore Postgres to a point in time from an S3 backup.
Scenarios
- Data corruption: Undetected DDL error or buggy data migration
- Accidental deletion: Someone truncated a table
- Ransomware: Attacker modified records; restore to clean point
- Test data spillage: Prod data was accidentally overwritten with test data
Prerequisites
- CNPG cluster healthy (if only restoring one table)
- S3 backup available and accessible
pgBackRestclient installed locally- Access to Kubernetes cluster
Step 1: Identify Backup Point
List available backups:
# List backups in S3aws s3 ls s3://velocity-backups/postgres/ --recursive | grep backup.info# Output:# 2026-05-19 14:32 backup-2026-05-19_14-32-56.tar# 2026-05-18 02:15 backup-2026-05-18_02-15-30.tarOr use CNPG’s backup listing:
kubectl get backups -n velocity-system -o wideOutput:
NAME BACKUP PHASE REFERENCE NAMEvelocity-20260519-143256 succeeded velocityvelocity-20260518-021530 succeeded velocityDetermine target restore point:
# Restore to latest backupBACKUP_ID=velocity-20260519-143256
# Or restore to specific timestamp# (must be within WAL archival window — typically 30 days)RESTORE_TIME="2026-05-19 10:00:00"Step 2: Create Restore Cluster (Preferred Method)
Create a temporary cluster to restore into, then validate:
cat > /tmp/restore-cluster.yaml <<EOFapiVersion: postgresql.cnpg.io/v1kind: Clustermetadata: name: velocity-restore namespace: velocity-systemspec: instances: 1 bootstrap: recovery: source: velocity recoveryTarget: name: $BACKUP_ID # OR for time-based restore: # timeline: latest # inclusive: false # backupID: $BACKUP_ID # targetTime: "$RESTORE_TIME" postgresql: parameters: shared_buffers: 4GB max_connections: 100 storage: size: 500Gi storageClass: ssd externalClusters: - name: velocity connectionParameters: host: velocity-rw.velocity-system.svc.cluster.local port: "5432" user: postgres barmanObjectStore: destinationPath: s3://velocity-backups/postgres s3Credentials: accessKeyId: name: aws-s3-creds key: ACCESS_KEY_ID secretAccessKey: name: aws-s3-creds key: SECRET_ACCESS_KEYEOF
kubectl apply -f /tmp/restore-cluster.yamlWait for restoration:
kubectl wait --for=condition=ready cluster velocity-restore -n velocity-system --timeout=30mVerify data:
kubectl exec -it velocity-restore-1 -n velocity-system -- psql -U postgres -c \ "SELECT COUNT(*) FROM acme_supply_chain_procurement.purchase_order_v1;"Step 3: Validate Restored Data
Run sanity checks:
# Check audit chain integritykubectl exec -it velocity-restore-1 -n velocity-system -- psql -U postgres -c \ "SELECT COUNT(*), COUNT(DISTINCT event_hash) FROM platform.audit_log LIMIT 1000;"# row_count should equal distinct event_hash count (no duplicates)
# Check for data anomalieskubectl exec -it velocity-restore-1 -n velocity-system -- psql -U postgres -c \ "SELECT COUNT(*) as deleted_records FROM acme_supply_chain_procurement.purchase_order_v1 WHERE deleted_at IS NOT NULL;"
# Compare record counts with productionkubectl exec -it velocity-1 -n velocity-system -- psql -U postgres -c \ "SELECT COUNT(*) as prod_count FROM acme_supply_chain_procurement.purchase_order_v1;"
kubectl exec -it velocity-restore-1 -n velocity-system -- psql -U postgres -c \ "SELECT COUNT(*) as restore_count FROM acme_supply_chain_procurement.purchase_order_v1;"# If counts match, restoration is goodStep 4: Swap Production (If Validation Passes)
Once validated, swap the production cluster:
# 1. Scale down API to 0 (prevent writes)kubectl scale deployment velocity-api -n velocity-system --replicas=0
# 2. Rename clusterskubectl patch cluster velocity -n velocity-system -p '{"metadata":{"name":"velocity-old"}}'kubectl patch cluster velocity-restore -n velocity-system -p '{"metadata":{"name":"velocity"}}'
# 3. Restart APIkubectl scale deployment velocity-api -n velocity-system --replicas=3
# 4. Verify API healthkubectl wait --for=condition=ready deployment velocity-api -n velocity-system --timeout=5mvelocity statusStep 5: Cleanup Old Cluster
Once confirmed, delete the old cluster:
# Keep the old cluster for 24 hours in case rollback neededkubectl delete cluster velocity-old -n velocity-system
# If everything is stable, delete the PVC tookubectl delete pvc velocity-old-1 -n velocity-systemAlternative: Restore Single Table
If only one table is corrupted:
# 1. Export table from backup (without modifying production)pg_dump -h velocity-restore-1.velocity-system.svc.cluster.local \ -U postgres \ -t acme_supply_chain_procurement.purchase_order_v1 \ -F c velocity > /tmp/purchase_order_backup.dump
# 2. Truncate corrupted table in productionkubectl exec -it velocity-1 -n velocity-system -- psql -U postgres -c \ "TRUNCATE acme_supply_chain_procurement.purchase_order_v1;"
# 3. Restore from dumppg_restore -h velocity-rw.velocity-system.svc.cluster.local \ -U postgres \ -d velocity \ -t acme_supply_chain_procurement.purchase_order_v1 \ /tmp/purchase_order_backup.dump
# 4. Verify integrityvelocity audit verify --schema acme/supply-chain/procurement/purchase-order/v1Data Loss Assessment
Determine what data was lost:
# Query audit log for when corruption occurredkubectl exec velocity-1 -n velocity-system -- psql -U postgres -c \ "SELECT event_id, timestamp, actor, operation, entity_id FROM platform.audit_log WHERE schema_path = 'acme/supply-chain/procurement/purchase-order/v1' AND timestamp > '2026-05-19 00:00:00' ORDER BY event_id DESC LIMIT 20;"
# Compare with restored datakubectl exec velocity-restore-1 -n velocity-system -- psql -U postgres -c \ "SELECT COUNT(*), MAX(created_at) FROM acme_supply_chain_procurement.purchase_order_v1;"Post-Restoration Checklist
- Verify API is responding (curl /healthz)
- Confirm record counts match expected
- Run
velocity audit verifyon affected schemas - Notify affected users of restoration
- Review backup strategy (how did this happen?)
- File incident ticket with timeline
- Update on-call playbook if steps were unclear
Contacts
- Database Team: #database Slack
- Incident Commander: /page-oncall in Slack