ADR-009: Pre-dispatch cancellation nack in QueueConsumer¶
- Status:
Accepted
- Date:
2026-05-16
- PR:
#373 (fix(queue): nack on cancellation and RetryLaterError)
Context¶
QueueConsumer pre-acks every delivery before handing it to the
worker, to protect long-running jobs from RabbitMQ’s
consumer_timeout (messages acked after the timeout trigger
a channel close and a reconnect storm). The pre-ack pattern means that
once a message is consumed, the only way to “return” it to the queue
is a republish (used by RetryLaterError).
Before PR #373, a Job that was cancelled through the API/UI while
already queued in RabbitMQ would still be delivered to a worker. The
worker’s _claim_job guard detected CANCELLED status and exited
cleanly, but only after taking a prefetch slot for the full round-trip.
With prefetch_count=1 on the prediction queue, a queue of orphaned
cancellation messages could stall all legitimate work.
Decision¶
Check Job.status == CANCELLED inside QueueConsumer._on_message,
before pre-acking and before entering the worker. If cancelled, call
channel.basic_nack(requeue=False) to discard the message
immediately.
The check is a point-in-time DB query; it accepts two failure modes:
A race where the job is cancelled after the check passes: the worker still terminates at
_claim_job, as before.A transient DB error: log a warning and proceed with dispatch (fail open) so a DB blip does not orphan legitimate jobs.
Consequences¶
Cancelled messages are drained from the queue without blocking a prefetch slot.
Prefetch=1 queues (predictions) can no longer deadlock on a backlog of cancellations.
The DB gains one extra
SELECTper delivery on the job-backed queues (protea.ping,protea.jobs,protea.embeddings). These queues are low-throughput; the overhead is negligible.OperationConsumeris unaffected (noJobrow to cancel).