ADR-D34: Selective rerank resurrection, recompute not archaeology¶
- Status:
Accepted
- Date:
2026-05-16 (proposed), 2026-05-17 (accepted with multi-seed numbers), 2026-05-18 (closure ratified by lab LR.4 + paired CI from LB.3)
Note
Terminology. Earlier drafts called the pre-fix dataset
“leakage-contaminated”. That framing is technically inaccurate: the
CAFA temporal partition (NK / LK / PK in
CAFA Evaluation Protocol) is mathematically clean; PK simply
records that the protein had experimental annotations in some
namespace at t0, which is a legitimate evaluation split, not a leak.
The actual incident was a dataset-construction replication
artefact in the early export_research_dataset pipeline that
made anc2vec_query_known_count act as a row-bucket identifier;
see the Replication artefact in the anc2vec_query feature family
entry in Operational Insights and Lessons Learned for the corrected description. Wording in
this ADR has been updated accordingly, except for frozen artefact
identifiers (e.g. the lab axis value
feat=v6_features+lineage-leakfree and the memory keys
project_lb2_leakage_fixed_champion and
project_anc2vec_leakage_mechanism) where the literal string is
load-bearing.
Context¶
The reranker lab maintains validation bands for bench-v1-K5 across v220,
v226, and v230 lineage. The LAFA submission uses the v226-v230 band. The
historical “selective rerank at K=10” champion record (avg cafaeval
0.4562) existed only as a memory-only entry. That record predated the
range distinction and was not generated with explicit eval_set_name
tracking.
Lab memory showed the legacy record as non-reproducible (pre-replication-fix) and of
unknown range provenance. The lab summary file under the replication-fixed
bench runs directory does
not contain 0.4562 for bench-v1-K5-v226-lineage-prostt5 or any other current
validation band. The record was therefore not reproducible or
comparable to current champion runs.
On 2026-05-05 the anc2vec_query_known_count replication-artefact finding (memory
project_anc2vec_leakage_mechanism) confirmed that any pre-fix Fmax is
inflated and must be discarded; the only correct path is to recompute
on the replication-fixed feature set against the current bench.
On 2026-05-17 the LB.2 multi-seed sweep landed (lab branch
task/bioinfo-quick-1778972872-6d9a, commit 77c3b33):
6 NK+LK cells (nk-mfo, nk-bpo, nk-cco, lk-mfo, lk-bpo, lk-cco) trained
for 3 seeds each (42, 7, 137) on the replication-fixed bench-v1-K5-v226-lineage-prostt5 configuration
(no anc2vec, no PCA features; lambdarank; LR=0.05, leaves=63,
num_boost_round=10000, early_stop=100). The 9-cell selective policy
applies the reranker on NK+LK cells and falls back to KNN baseline on
PK cells, where lineage features induce a DAG-closure shortcut that
overfits.
Decision¶
Recompute, not archaeology. When historical records conflict with, or cannot be reproduced on, current validation data, recompute on the current bench rather than reverse-engineer the old configuration.
Selective rerank on NK+LK; KNN baseline fallback on PK. This is the live PROTEA inference policy for
bench-v1-K5-v226-lineage-prostt5pending the next champion sweep.Configuration: replication-fixed
bench-v1-K5-v226-lineage-prostt5bundle (thev6_featuresbundle with lineage features enabled, minus allanc2vec_*andemb_pca_*columns), per-cell lambdarank LightGBM booster.Champion numbers (multi-seed, 2026-05-17):
per-cell cafaeval Fmax, mean over seeds 42 / 7 / 137¶ cell
s42
s7
s137
mean
95% CI half
baseline (KNN)
nk-mfo
0.7112
0.7041
0.7041
0.7065
0.0036
0.6447
nk-bpo
0.5599
0.5571
0.5618
0.5596
0.0024
0.5333
nk-cco
0.7733
0.7830
0.7758
0.7774
0.0048
0.7000
lk-mfo
0.6877
0.6786
0.6757
0.6806
0.0060
0.5816
lk-bpo
0.6472
0.6421
0.6485
0.6460
0.0032
0.5844
lk-cco
0.7434
0.7252
0.7417
0.7367
0.0091
0.7053
Aggregate numbers:
6-cell NK+LK reranker avg: 0.6845.
9-cell selective avg (6 NK+LK rerank + 3 PK baseline fallback): 0.6215 ± 0.0014 (95% CI half-width on 9-cell mean, derived from a 10000-iteration bootstrap of the 3-seed mean per cell).
PK baseline values (used in fallback): pk-mfo 0.483, pk-bpo 0.403, pk-cco 0.601.
Supersedes the legacy memory-only 0.4562 record. The legacy record (memory key
project_v18_selective_rerank) is retained for audit and explicitly marked as superseded. It is not comparable to the new champion: different feature set, different range, non-reproducible (pre-replication-fix).Deployment. The replication-fixed bench-v1-K5-v226-lineage-prostt5 config becomes the PROTEA inference default for
bench-v1-K5-v226-lineage-prostt5on NK+LK cells. OlderRerankerModelrows from pre-replication-fix sweeps are considered stale; they remain in the registry for traceability but should not be selected for new inference jobs.FARM-EXP.10 closure. The slice scope changed from “reconstruct axis tuple from RerankerModel table” to “re-train with current policy on current bench”. This ADR records the multi-seed acceptance.
Lab LR.4 closure (2026-05-18). Lab PR
protea-reranker-lab#21(lab(LR.4): replication-artefact-free re-run of the historical selective-rerank policy) formalises the supersession on the lab side: it landsscripts/lr4_v18_selective.py(regenerator readingruns/lb2_multiseed/cis.jsonor falling back to documented canonical numbers),experiments/lr4/v18_selective_delta.csv(canonical acceptance artefact: per-cell selective rerank table plus aggregate delta row), and anEXPERIMENTS.mdLR.4 closure section. The CSV reports the same-bench all-baseline reference (the pre-fix 0.4562 had no per-cell breakdown on file, so the lift is reported against the new bench baseline; the publishable selective-rerank lift on bench-v1-K5-v226-lineage-prostt5 is +0.0397 over the same-bench KNN baseline). The legacy record is also marked superseded in the lab champions appendix.Paired CI evidence (LB.3, 2026-05-18). Lab PR
protea-reranker-lab#19(lab(LB.3): per-cell paired CI on replication-fixed champion) supplies the per-cell paired confidence intervals for the recomputed run. All 6 NK+LK cells are significant at the 95% level (6 of 6); see memoryproject_lb3_paired_ci_2026_05_18. This is the statistical evidence supporting the deployment policy in points 2 and 6 above.
Consequences¶
Positive
Eliminates the need to reverse-engineer unknown historical configs.
Produces a valid, reproducible champion record with full range traceability (
eval_set_namepinned).Establishes a scalable pattern for future legacy-record conflicts: recompute on the current bench.
All 6 NK+LK cells show strictly positive lift across all 3 seeds (max CI half-width 0.0091 on lk-cco). The selective policy is robust to seed variation.
Negative
The legacy 0.4562 record is explicitly marked as not comparable to current champions. Any narrative claiming continuity with the old cell is incorrect.
Requires regeneration of the cell, not mere documentation of an existing artefact.
The catalog cell
axis.featuresvalue for this replication-fixed bundle is not yet a named entry in the FARM-EXP.2 transversal catalog (seeproject_farm_exp_2_placeholder_digests); a follow-up slice will add the replication-fixed bundle as a first-class axis value once the digest backfill clears.
Neutral
Memory record
project_v18_selective_rerankdocuments the historical value and its supersession. Future maintainers can cross-reference if needed.The 3 PK cells continue to ride the KNN baseline. A future slice (tracked separately) explores known-terms overlap features as a PK-specific lift signal.
References¶
Lab champion declaration:
EXPERIMENTS.mdinfrapercan/protea-reranker-lab(FARM-EXP.10 champion section, LB.2 multi-seed sweep section, LR.4 closure section).LB.2 multi-seed sweep commit:
protea-reranker-labbranchfeat/FARM-EXP.10-transversal-champion(cherry-pick of77c3b33fromtask/bioinfo-quick-1778972872-6d9a).Lab LR.4 closure PR:
protea-reranker-lab#21(merged 2026-05-18, commit1c17f75); shipsscripts/lr4_v18_selective.py,experiments/lr4/v18_selective_delta.csv, and the LR.4EXPERIMENTS.mdclosure section.Lab LB.3 paired CI PR:
protea-reranker-lab#19(merged 2026-05-18); per-cell paired confidence intervals (6 of 6 NK+LK cells significant at the 95% level).Memory entry
project_lb2_leakage_fixed_champion(publishable numbers, 0.6215 plus or minus 0.0014 selective avg).Memory entry
project_lb3_paired_ci_2026_05_18(paired CI evidence, 6 of 6 NK+LK cells significant).Memory entry
project_anc2vec_leakage_mechanism(root cause for the supersession of 0.4562).Memory entry
project_v18_selective_rerank(legacy champion, marked historical-only and superseded 2026-05-18).Memory entry
feedback_no_archaeology_recompute(policy decision; canonical wording of the recompute-not-archaeology rule).Memory entry
reference_lab_validation_ranges(v220 / v226 / v230 distinction).Memory entry
project_farm_exp_2_placeholder_digests(catalog shortid tentativeness; relevant to the replication-fixed bundle’s pending first-class axis registration).FARM-EXP.10 slice definition in
agent-farm/plans/farm-platform/PLAN.md.Lab summary file under the replication-fixed bench runs directory (current bench results).
Open follow-ups (deferred work, tracked separately):
runs/transversal/<shortid>/placement of per-seed FARM-EXP.3-formatrun.jsonrecords is deferred to the writer slice (FARM-EXP.5 and later), which is the slice that emits theaxisblock plus thefmax_samplesarray consumed byscripts/update_champions.py. Until that slice lands, the champion declaration lives in the labchampions.mdmanual appendix (delimited byMANUAL_ENTRIES_BEGIN/MANUAL_ENTRIES_ENDmarkers) so that re-renders do not strip it. See lab PRprotea-reranker-lab#15(FARM-EXP.10 formalisation) and the LR.4 PR for the appendix-survival test cases.The replication-fixed bundle (the
feat=v6_features+lineage-leakfreeaxis value: the basev6_features+lineagebundle minusanc2vec_*andemb_pca_*) was added as a first-class entry in the FARM-EXP.2 transversal catalog via FARM-EXP.10b (lab PRprotea-reranker-lab#16); the digest-backfill slice that clears the placeholder shortids documented inproject_farm_exp_2_placeholder_digestsis the remaining prerequisite for stable axis-tuple identifiers.