PROTEA stack

Note

This page is generated from docs/source/_data/stack.yaml. Run python scripts/sync_stack.py to regenerate it.

PROTEA is split across eight repositories. The platform repository (this one) hosts the orchestration, ORM, queue, and HTTP surface; the rest are pluggable contracts, runtime modules, and tooling.

Repository

Role

Status

Summary

PROTEA

Platform

active

Backend platform. Hosts the ORM, job queue, FastAPI surface, frontend, and orchestration.

protea-contracts

Contracts

beta

Shared contract surface. ABCs, pydantic payloads, feature schema, schema_sha. Imported by every other repo.

protea-method

Inference

active

Pure inference path (KNN, feature compute, reranker apply). Delegation target for the F2C extraction; live in production since F2C.5b. Bind-mounted by the LAFA containers.

protea-sources

Source plugin

active

Annotation source plugins (GOA, QuickGO, UniProt). Discovered via Python entry_points (goa, quickgo, uniprot).

protea-runners

Runner plugin

active

Experiment runner plugins (LightGBM, KNN, baseline). Discovered via Python entry_points (lightgbm, knn, baseline).

protea-backends

Backend plugin

active

Protein language model embedding backends (ESM family, T5/ProstT5, Ankh, ESM3-C). Discovered via Python entry_points (esm, t5, ankh, esm3c).

protea-reranker-lab

Lab

active

LightGBM reranker training lab. Pulls datasets from PROTEA, trains boosters, publishes them back via /reranker-models/import-by-reference.

cafaeval-protea

Evaluator

active

Standalone fork of cafaeval (CAFA-evaluator-PK) with the PK-coverage fix and a bit-exact parity guarantee against the upstream.

Cross-cutting concerns

Every other repository depends on protea-contracts as its shared surface (ABCs, payloads, feature schema). The platform discovers source, runner and backend plugins via Python entry_points groups protea.sources, protea.runners, and protea.backends.