AN-014: Leakage audit (D3 diagnostic)¶
Intuition (plain-language)
When a screen is built and scored on the same items, structural reuse can inflate performance ("leakage"). We tighten the evaluation in three steps: raw item-level (0.995), out-of-fold by firm (0.891), temporal holdout (0.864). The drop is real but bounded — the honest discriminating signal lives in the 0.86–0.89 band, comfortably above random. The tell that it is genuine: against direct defendants the AUC stays ~0.51 under every regime, so the screen isn't memorizing identities, it is ranking loser-side behavior.
Question¶
How much does item-level evaluation leak relative to out-of-fold and temporal-holdout retraining? The audit is the transparency block on the generalization gap.
Design¶
- Sample: item × firm panel in BEC 2009–2019.
- Steps:
- Raw item-level AUC (in-sample item-firm panel — leaks repeated-firm structure).
- Out-of-fold cross-validation at cobidder-firm level (no firm appears in both train and test folds).
- Temporal holdout: train 2009–2016, test 2017–2019.
- Outcome: cobidder AUC at each step; direct-defendant AUC as null reference at each step.
Results¶
| Step | Cobidder AUC | Direct-defendant AUC |
|---|---|---|
| Raw item-level | 0.995 [0.995, 0.995] | 0.506 [0.505, 0.507] |
| Out-of-fold CV at firm | 0.891 [0.887, 0.894] | — |
| Temporal holdout (firm) | 0.864 [0.858, 0.870] | 0.511 [0.510, 0.513] |
Macros: \valAUCitemRaw, \valAUCitemCV, \valAUCitemCVCI,
\valAUCitemTemp, \valAUCitemTempCI, \valAUCitemDirect,
\valAUCItemDirectTemp, \valLeakStructLow, \valLeakStructHigh,
\valLeakTautLow, \valLeakTautHigh.
The cobidder AUC drops 0.104 from raw to OOF, and another 0.027 from
OOF to temporal holdout — totalling ~0.10–0.13 attenuation
(\valLeakTautLow–\valLeakTautHigh). The remaining AUC band of
0.86–0.89 (\valLeakStructLow–\valLeakStructHigh) is the structural
discriminating performance the operational claim relies on.
Figure: sensitivity contour for the leakage audit, showing how AUC attenuates as evaluation regime tightens (raw item-level → OOF firm- level → temporal holdout). The structural discriminating band sits at ~0.86–0.89; direct-defendant AUC stays at ~0.51 across regimes (predicted null, AN-007).
Interpretation¶
Verdict: DEFENSIBLE. Item-level evaluation leaks repeated-firm structure, inflating AUC to ~1.0. The disciplined OOF and temporal- holdout columns drop the AUC by 0.10–0.13 but leave it above 0.85 — operationally informative. Direct-defendant AUC stays at ~0.51 in every scenario (predicted null, AN-007), confirming the structural scope limit independently of evaluation regime.
The audit is reported as an anti-leakage transparency block in the online appendix of the paper. Its purpose is to neutralize the JLEO- reviewer suspicion that ~1.0 AUC numbers signal leakage rather than real discrimination.
Follow-ups¶
- Sensitivity to fold definitions (firm vs item vs item-PBU).
- Re-estimation under alternative cobidder definitions.
- Triangulation with AN-006.
