ADR-D2: export_research_dataset lives in protea-core¶
- Status:
Accepted
- Date:
2026-05-05
- Phase:
F1
Context¶
The export operation produces frozen train.parquet and eval.parquet
artefacts consumed by the LightGBM lab. It needs the feature schema, the
KNN reference cache, and access to the relational data model. Two options
were considered: keep it in protea-core, or move it into the
protea-runners repository alongside the LightGBM trainer.
Decision¶
Keep export_research_dataset in protea-core. The feature schema is
imported from protea-contracts.
Consequences¶
Schema bumps in
protea-contractsforce a newprotea-corerelease but not a newprotea-runnersrelease.The lab consumes the dataset over the artifact store, no Python import coupling.
Resolution¶
Closed.