Configuration Reference¶
PROTEA loads its configuration from two sources, merged in this order (later entries win):
protea/config/system.yaml(file-based defaults)Environment variables (runtime overrides)
YAML structure¶
database:
url: postgresql+psycopg://user:pass@host:5432/dbname
queue:
amqp_url: amqp://guest:guest@localhost:5672/
storage:
artifacts_dir: storage/evaluation_artifacts # cafaeval output
backend: local # "local" | "minio"
root: storage/artifacts # local backend root
minio:
endpoint: localhost:9000
bucket: protea
access_key: minioadmin
secret_key: minioadmin
secure: false # true for HTTPS
admin:
token: protea-admin
Only database.url and queue.amqp_url are strictly required; the
storage, admin sections have working defaults. The file is loaded
by protea.infrastructure.settings.load_settings(project_root) at
startup.
The storage block drives the ArtifactStore abstraction described
in Infrastructure. With backend: local (default)
all blobs land under storage/artifacts/ on the API host. Setting
backend: minio activates the S3-compatible path; requires the
[storage] extra (pip install 'protea[storage]') and a running
MinIO instance (see docker compose --profile storage up). Paths
under storage.* are resolved relative to the project root when not
absolute.
Environment variable overrides¶
Variable |
Description |
|---|---|
|
Overrides |
|
Overrides |
|
Overrides |
|
Overrides |
|
Overrides |
|
Overrides |
|
Overrides |
|
Overrides |
|
Overrides |
|
Overrides |
|
When |
|
Shared HS256 secret used to sign and verify |
|
slowapi rate-limit rule for |
|
slowapi rate-limit rule for |
|
slowapi rate-limit rule for |
|
Directory for the on-disk KNN reference cache (embedding +
annotation matrices keyed by |
|
Directory for per-PLM PCA projection states ( |
|
GitHub token used by |
|
Per-chunk query count for the numpy KNN backend (forwarded to
|
|
Comma-separated CORS allowlist for the FastAPI app (T5.5).
Priority: this env var overrides |
|
Number of parallel worker processes for pairwise alignment feature
computation inside |
|
Directory for the persistent SQLite alignment cache used by
|
Frontend¶
# apps/web/.env.local
NEXT_PUBLIC_API_URL=http://127.0.0.1:8000
# NEXT_PUBLIC_FARM_API_URL=http://localhost:8801 # override only for non-standard deployments
NEXT_PUBLIC_API_URL is the only variable required for normal
operation. It is injected at build time by Next.js and embedded in the
client bundle.
NEXT_PUBLIC_FARM_API_URL overrides the farm dashboard API origin
when the Next.js app cannot reach it through the default same-origin
proxy (/farm-api/* rewrites to http://localhost:8801 server-side).
Setting this variable is only necessary in non-standard deployments where
the farm API runs on a different host or port (PR #443).
Integration test environment variables¶
The Docker-based integration test fixture is controlled by:
Variable |
Default |
Description |
|---|---|---|
|
|
Docker image for the ephemeral Postgres container. |
|
|
Database user. |
|
|
Database password. |
|
|
Database name. |
|
|
Host port mapped to container port 5432. |
|
|
Seconds to wait for Postgres readiness. |
RabbitMQ management¶
The RabbitMQ management UI is available at http://localhost:15672 (default
credentials guest / guest). The ten PROTEA queues are:
Queue |
Consumer |
Operations |
|---|---|---|
|
QueueConsumer |
|
|
QueueConsumer |
|
|
QueueConsumer |
|
|
QueueConsumer |
|
|
OperationConsumer |
|
|
OperationConsumer |
|
|
QueueConsumer |
|
|
OperationConsumer |
|
|
OperationConsumer |
|
|
QueueConsumer |
|
Queues are declared at worker startup and survive broker restarts.
Tuning settings¶
PROTEA exposes throughput, retry policy and boundary limits through
protea.config.tuning.TuningSettings (pydantic). Values are
resolved per call (defaults < tuning: section in
protea/config/system.yaml < env vars).
Env var convention: PROTEA_TUNING__<group>__<field>. Double
underscore is the path separator (matches pydantic-settings’
env_nested_delimiter) so it never collides with single
underscores inside field names.
Categories are derived from docs/CONFIG_INVENTORY.md (T-CONF.1
of master plan revision 3) and migrated incrementally in T-CONF.2.
QueueTuning¶
RabbitMQ publisher and consumer policy.
Field |
Default |
Purpose |
|---|---|---|
|
12 |
Reintentos máximos al publicar a RabbitMQ. 12 attempts cubren ~4 min de broker downtime con backoff exponencial cap a 30s. |
|
1.0 |
Backoff inicial publisher en segundos. Multiplica x2 por intento. |
|
5 |
Reintentos al hit CUDA OOM en GPU worker. |
|
5 |
Backoff inicial OOM en segundos. |
|
300 |
Cap del backoff OOM en segundos (5 min). |
YAML excerpt:
tuning:
queue:
publisher_max_attempts: 12
oom_max_retries: 5
Env override example:
PROTEA_TUNING__QUEUE__PUBLISHER_MAX_ATTEMPTS=20
WorkerTuning¶
Pool sizes, in-process caches, reaper timeouts, HTTP cache TTL.
Field |
Default |
Purpose |
|---|---|---|
|
20 |
SQLAlchemy connection pool size. |
|
40 |
Conexiones extra permitidas durante picos. |
|
3600 |
Reciclar conexiones tras N segundos. |
|
1 |
Modelos PLM en cache por proceso de embeddings. |
|
1 |
Reference data sets en cache por proceso predict. |
|
21600 |
Timeout duro antes de marcar jobs FAILED en producción (6h). |
|
3600 |
Default constructor de StaleJobReaper. |
|
1800 |
Tiempo sin JobEvent antes de considerar un job stalled. |
|
300.0 |
TTL default cache HTTP. |
OperationTuning¶
Module-level chunk and batch sizes used inside operations.
Field |
Default |
Purpose |
|---|---|---|
|
10_000 |
Filas por chunk al cargar/iterar anotaciones. |
|
2_000 |
Chunk size streaming PyArrow / SQLAlchemy yield_per. |
|
10_000 |
Filas por chunk al publicar predictions a la cola store. |
|
500 |
Query chunk size para KNN numpy backend (caps memoria de la matriz de distancias). |
HTTP retry policy and per-source timeouts (UniProt, GOA, QuickGO,
ontology) live inside the respective pydantic payloads
(InsertProteinsPayload, LoadGoaAnnotationsPayload, etc.) by
design: callers pick them per-job rather than as global infra
defaults.
APILimits¶
HTTP boundary limits enforced at the FastAPI router layer.
Field |
Default |
Purpose |
|---|---|---|
|
52428800 (50 MB) |
Tope upload FASTA en bytes. Aplica a |
|
500 |
Caracteres máximos por comentario en /support. |
|
20 |
Items devueltos por defecto en /support/recent. |
|
100 |
Page size hard cap para list endpoints de soporte. |
Config-exempt: research methodology constants¶
The following constants are deliberately not in TuningSettings because changing them would shift the canonical numbers reported in the thesis and papers:
EMBEDDING_PCA_DIM = 16(core/reranker.py): part of the feature schema contract thatprotea-contractswill own; it gates compatibility with trained boosters.N_THRESHOLDS = 101(core/metrics.py): CAFA Fmax sweep granularity. Changing it produces non-comparable Fmax numbers.
Structural exempt¶
Format-spec positional indices live in code (e.g. GAF column indices
in core/operations/load_goa_annotations.py). They are not
configurable because doing so would mean PROTEA stops reading the
GAF format.