PROTEA stack¶
Note
This page is generated from docs/source/_data/stack.yaml.
Run python scripts/sync_stack.py to regenerate it.
PROTEA is split across eight repositories. The platform repository (this one) hosts the orchestration, ORM, queue, and HTTP surface; the rest are pluggable contracts, runtime modules, and tooling.
Repository |
Role |
Status |
Summary |
|---|---|---|---|
Platform |
|
Backend platform. Hosts the ORM, job queue, FastAPI surface, frontend, and orchestration. |
|
Contracts |
|
Shared contract surface. ABCs, pydantic payloads, feature schema, schema_sha. Imported by every other repo. |
|
Inference |
|
Pure inference path (KNN, feature compute, reranker apply). Delegation target for the F2C extraction; live in production since F2C.5b. Bind-mounted by the LAFA containers. |
|
Source plugin |
|
Annotation source plugins (GOA, QuickGO, UniProt). Discovered via Python entry_points (goa, quickgo, uniprot). |
|
Runner plugin |
|
Experiment runner plugins (LightGBM, KNN, baseline). Discovered via Python entry_points (lightgbm, knn, baseline). |
|
Backend plugin |
|
Protein language model embedding backends (ESM family, T5/ProstT5, Ankh, ESM3-C). Discovered via Python entry_points (esm, t5, ankh, esm3c). |
|
Lab |
|
LightGBM reranker training lab. Pulls datasets from PROTEA, trains boosters, publishes them back via /reranker-models/import-by-reference. |
|
Evaluator |
|
Standalone fork of cafaeval (CAFA-evaluator-PK) with the PK-coverage fix and a bit-exact parity guarantee against the upstream. |
Cross-cutting concerns¶
Every other repository depends on protea-contracts as its
shared surface (ABCs, payloads, feature schema). The platform
discovers source, runner and backend plugins via Python
entry_points groups protea.sources, protea.runners,
and protea.backends.