schema_sha_v2 rollout (T1.6)¶
This runbook walks the operator through applying the T1.6 (ADR D10)
schema_sha_v2 migration in production: pre-flight backup, schema
upgrade, idempotent backfill, verification, and rollback. It is the
production counterpart of the dry-run validated by the T1.6
schema_sha_v2 dry-run branch against a clone of the live database.
Pre-flight checklist¶
Take a fresh pg_dump backup before any schema change.
backup_dir="$HOME/Thesis2/backups" ts=$(date -u +%Y%m%d-%H%M%S) out="$backup_dir/pre-t16-$ts.dump" time docker exec protea-postgres-1 pg_dump \ -U protea -F c -d protea > "$out" ls -lh "$out"
The drill clone restored a 1.9G
-F cdump in roughly ten minutes; the equivalent production dump completes in the same order of magnitude on the same hardware. Hold the dump path for the rollback step.Confirm the alembic head matches what the migration expects.
cd ~/Thesis2/repositories/PROTEA poetry run alembic current poetry run alembic heads
Both must print
9d2fb5a07e4c (head)immediately before the upgrade. If they differ, stop and reconcile with the deployer.Quiesce writes to the affected tables. Pause the experiment worker and any pipeline that issues
export_research_dataset,register_reranker, or thereranker-models/importandreranker-models/import-by-referenceendpoints (under the/v1/prefix) until the backfill finishes. The migration itself is additive and non-blocking, but the backfill needs a consistent view of every row.
Apply the migration¶
The migration is the file
alembic/versions/a3c1d8e2f4b6_t1_6_add_schema_sha_v2_columns.py.
It is strictly additive: two ADD COLUMN + two CREATE INDEX
statements. Postgres can perform both without rewriting either table,
so wall time stays in the low seconds even on production-scale data
(measured: under three seconds against the drill clone, which carried
the same row counts as production at the time of the dry run).
cd ~/Thesis2/repositories/PROTEA
export PROTEA_DB_URL="postgresql+psycopg://<user>:<pw>@<host>:<port>/<db>"
poetry run alembic upgrade head
Verify both columns and their indexes landed:
docker exec protea-postgres-1 psql -U protea -d protea -c "\d dataset" \
| grep -E "schema_sha"
docker exec protea-postgres-1 psql -U protea -d protea -c "\d reranker_model" \
| grep -E "schema_sha"
You should see schema_sha and schema_sha_v2 on both tables,
plus ix_dataset_schema_sha_v2 and ix_reranker_model_schema_sha_v2
in the index list.
Backfill¶
The backfill script computes the canonical schema_sha_v2 value via
protea_contracts.compute_schema_sha applied to the running feature
registry, writes it to every row, and commits every --batch-size
rows for resumability. Idempotent: rows where schema_sha_v2 is
already populated are skipped unless --force is supplied.
Step 1: dry-run on production. Prints what would be written without mutating any row.
cd ~/Thesis2/repositories/PROTEA
time poetry run python scripts/backfill_schema_sha_v2.py --dry-run
Step 2: apply.
time poetry run python scripts/backfill_schema_sha_v2.py --batch-size 100
Wall time measured on the drill clone (single-digit row counts at the time of the dry run): under one second end to end. Even a production load of a few thousand rows fits well under one minute because the script reads the canonical SHA once and writes a single column value per row.
Verification¶
The backfill is complete when every row has a populated schema_sha_v2 column.
docker exec protea-postgres-1 psql -U protea -d protea -tAc \
"SELECT count(*) FROM dataset WHERE schema_sha_v2 IS NULL"
# expected: 0
docker exec protea-postgres-1 psql -U protea -d protea -tAc \
"SELECT count(*) FROM reranker_model WHERE schema_sha_v2 IS NULL"
# expected: 0
Spot-check that the original and parallel SHAs agree for fresh exports (rows where the two should match because they were produced after the parity fix):
docker exec protea-postgres-1 psql -U protea -d protea -c \
"SELECT id, name, schema_sha, schema_sha_v2,
(schema_sha = schema_sha_v2) AS matches
FROM dataset ORDER BY created_at DESC LIMIT 10;"
Rows produced before the parity fix will show matches = false;
this is expected, and inference will pick up the schema_sha_v2
column. Drift in rows where both algorithms should have agreed is a
regression and must be investigated before the next export.
Rollback¶
The migration is safe to roll back mid-deploy: dropping the indexes
and columns has no effect on inference, which falls back to the legacy
schema_sha / feature_schema_sha columns when schema_sha_v2
is absent or NULL.
cd ~/Thesis2/repositories/PROTEA
poetry run alembic downgrade -1
Verify the columns are gone:
docker exec protea-postgres-1 psql -U protea -d protea -tAc \
"SELECT count(*) FROM information_schema.columns
WHERE table_name='dataset' AND column_name='schema_sha_v2'"
# expected: 0
If the rollback itself fails (it should not, since the migration is additive), restore from the pre-flight pg_dump:
docker exec protea-postgres-1 psql -U protea -d protea -c "DROP SCHEMA public CASCADE; CREATE SCHEMA public;"
time docker exec -i protea-postgres-1 pg_restore \
-U protea -d protea < "$out"
poetry run alembic stamp 9d2fb5a07e4c
Measured wall times (drill clone)¶
These numbers come from the dry-run executed by the T1.6 branch
against a 1.9G -F c dump of production restored into a
pgvector/pgvector:pg16 container on the same host. Re-measure on
production hardware before quoting them in a change ticket.
pg_restoreof the 1.9G dump: ~10m35s (one-shot, only relevant for dry-run reproducibility).alembic upgrade headthrough the T1.6 migration: ~2.85s (covers three pending revisions; the T1.6 step itself is the last one).backfill_schema_sha_v2.py --dry-runfor 2 datasets + 51 reranker models: ~3.66s.backfill_schema_sha_v2.py(apply): ~1.53s for the same 53 rows. Idempotent second run: ~1.55s (no writes, all rows skipped).alembic downgrade -1: ~0.61s.
Idempotency note¶
Running the backfill twice without --force is a no-op: the script
prints skipped=<n> for each table and exits zero. --force
overwrites existing values; only use it to repair a partial run that
wrote the wrong digest (e.g. against a stale feature registry).