RunbooksΒΆ
Operational runbooks for PROTEA critical procedures. Each runbook lists observable symptoms, concrete diagnosis steps with real commands, and a fix sequence that an operator can execute without prior context.
- Deployment Guide
- Secrets management runbook (sops + age onboarding)
- Disaster Recovery
- Stale Job Reaper
- DLQ Triage
- Ngrok Deploy Recovery
- Embedding Worker OOM
- schema_sha_v2 backfill
- schema_sha_v2 rollout (T1.6)
- Observability: OpenTelemetry SDK
- Observability Operator Runbook
- Observability: Loki log aggregation
- Observability: Prometheus metrics
- Process-Based Stack Deployment Guide