ADR-003: Two types of consumer

Date:

2026-01-10

Author:

frapercan

Status:

Accepted

Context

Distributed pipelines (compute_embeddings, predict_go_terms) split work into hundreds of batches. If each batch had its own Job row in the DB:

  • The jobs table fills with thousands of rows per prediction run, making it impossible to see real user-facing jobs.

  • Each batch pays the cost of the two-session pattern (2 round-trips), which for 2-8s batches is more overhead than useful work.

Decision

Two consumers coexist:

QueueConsumer. For user-facing jobs with full lifecycle tracking:

  • Receives {"job_id": "<uuid>"} and delegates to BaseWorker.handle_job().

  • Used by: protea.ping, protea.jobs, protea.training, protea.embeddings, protea.predictions, protea.evaluations.

OperationConsumer. For ephemeral batches with no individual DB row:

  • Receives {"operation": "...", "job_id": "<parent>", "payload": {...}}.

  • Executes the operation in a single session, ack/nack, done.

  • Progress is reported by incrementing progress_current on the parent job.

  • Events are written to the parent’s log with the child. prefix.

  • Used by: protea.embeddings.batch, protea.embeddings.write, protea.predictions.batch, protea.predictions.write.

From the outside, the user sees a single job (the coordinator) with a progress bar that advances. Batches are invisible.

Consequences

  • Two code paths for consuming messages, but both are short (~100 lines) and share infrastructure (DLQ, registry, emit).

  • If a batch fails and goes to the DLQ, there is no individual retry counter, just the dead message for inspection.

Rejected alternatives

  • Job with is_batch=True flag: still creates thousands of DB rows.

  • Fire-and-forget without tracking: operators lose visibility into progress and failures.