Installation and Quickstart¶
Prerequisites¶
Before starting PROTEA you need:
Python 3.12+ with Poetry
PostgreSQL 16 (local or remote)
RabbitMQ 3.x with the management plugin enabled
Node.js 20+ with
npm(for the Next.js frontend)
Install dependencies¶
git clone <repo-url> PROTEA
cd PROTEA
poetry install # runtime only (slimmest install)
poetry install --with lint,test,docs # full local dev environment
The dev tooling is split into three optional Poetry groups so each CI job installs only the packages it needs:
--with lint: ruff, mypy, type stubs, taskipy.--with test: pytest, pytest-cov, httpx, uvicorn, plusprotea-reranker-labfor parity tests.--with docs: Sphinx, furo, sphinx-copybutton, sphinx-design, shibuya theme, sphinxcontrib-bibtex.
A bare poetry install no longer installs Sphinx or pytest; pick
the groups you need.
Optional extras:
poetry install -E storage # adds the 'minio' client for the
# MinIO artifact-store backend
The [storage] extra is only required when storage.backend: minio
is set in system.yaml (or PROTEA_STORAGE_BACKEND=minio). The
default local-filesystem backend works with the base install.
Configuration¶
Copy the example configuration and adjust for your environment:
mkdir -p protea/config
cat > protea/config/system.yaml <<EOF
database:
url: postgresql+psycopg://user:pass@localhost:5432/biodata
queue:
amqp_url: amqp://guest:guest@localhost:5672/
EOF
Note
system.yaml is not committed to version control. Do not store
production credentials in the repository.
Environment variables PROTEA_DB_URL and PROTEA_AMQP_URL override the
YAML values and take precedence.
Frontend configuration:
echo "NEXT_PUBLIC_API_URL=http://127.0.0.1:8000" > apps/web/.env.local
Bring up infrastructure¶
Postgres, RabbitMQ and MinIO run in docker compose; the application
runs bare-metal (next section). The split keeps hot-reload natural for
the application while pinning infra versions.
# Bring up postgres, rabbitmq and minio (the storage profile activates MinIO)
docker compose --profile storage up -d postgres rabbitmq minio
# Wait for healthchecks
docker compose --profile storage ps
The MinIO console is then available at http://localhost:9001
(default credentials minioadmin / minioadmin).
Note
docker-compose.yml also declares api, frontend and the
worker services so that a single docker compose --profile storage
up -d can run the entire platform in containers (production-style
deployment, see docker-compose.prod.yml). For dev work where you
iterate on Python and Next.js code, leave those services down and
use manage.sh instead so file changes hot-reload without an
image rebuild.
Initialise the database¶
The Compose postgres service runs docker/init.sql at first
volume creation, which only enables the vector extension. Tables
are created either by init_db.py (fresh setup) or by Alembic
migrations (existing schema):
# Fresh: create every table from SQLAlchemy metadata
poetry run python scripts/init_db.py
# Existing: bring schema up to head
alembic upgrade head
If you are restoring from a backup instead, skip both. See
Operational Runbook “Disaster recovery” for the pg_restore procedure.
Start the application stack¶
bash scripts/manage.sh start [N] # N = batch workers per pipeline (default 1)
This starts all processes in the background and writes PIDs to logs/pids/:
Process |
Address |
Log file |
|---|---|---|
FastAPI (uvicorn) |
|
|
Worker: |
n/a |
|
Worker: |
n/a |
|
Worker: |
n/a |
|
Worker: |
n/a |
|
Worker: |
n/a |
|
Worker: |
n/a |
|
Worker: |
n/a |
|
Worker: |
n/a |
|
Worker: |
n/a |
|
Worker: |
n/a |
|
Stale job reaper ( |
n/a |
|
Next.js frontend |
|
Stack management commands:
bash scripts/manage.sh stop # stop all processes
bash scripts/manage.sh status # show PID, RAM, running/dead per worker
bash scripts/manage.sh logs [name] # tail logs (interactive picker or name fragment)
bash scripts/manage.sh scale <queue> [N] # add N extra workers without restart
Verify the installation¶
Open http://127.0.0.1:3000 in a browser and submit a ping job from the
UI. The job should transition QUEUED → RUNNING → SUCCEEDED within a
second. The event timeline will show a ping.pong event.
Alternatively, use the API directly:
curl -s -X POST http://127.0.0.1:8000/jobs \
-H "Content-Type: application/json" \
-d '{"operation":"ping","queue_name":"protea.ping","payload":{}}' | python -m json.tool
Expose to the internet¶
To share PROTEA with an external reviewer (e.g. a supervisor) without a public server, run:
bash scripts/expose.sh
The script uses ngrok with a free static domain
(protea.ngrok.app).
It opens a single tunnel to the Next.js frontend (:3000).
API calls are transparently proxied through the frontend via the
/api-proxy/:path* rewrite rule in apps/web/next.config.ts, so the
API port (:8000) is never exposed directly.
Prerequisites:
Install ngrok: https://ngrok.com/download
Authenticate once:
ngrok config add-authtoken <TOKEN>
Press Ctrl+C to close the tunnel.
Note
The stack must already be running (bash scripts/manage.sh start) before
calling expose.sh.
Run tests¶
# Unit tests (no external services required)
poetry run pytest
# Integration tests (pulls a pgvector/pg16 Docker image)
poetry run pytest --with-postgres
# Single test
poetry run pytest tests/test_insert_proteins.py::TestInsertProteinsPayload -v