Joint award + bid scoring is the full-observability upper bound¶
Intuition (plain-language)
Suppose a regulator had everything — every bid, fully recorded — and ran the classic bid-distribution screens too. Would the cheap loser-side signal still matter? Yes: combining both does best (AUC ≈ 0.96), and the gain over either one alone shows the two are measuring different things. The loser-side signal is not a budget substitute for bid microdata; it carries information the bid screens miss. This combined model is the best-case ceiling — a useful benchmark, but it assumes data a regulator usually does not have.
🟡 On the cobidder target, a classifier that uses both the award- layer score (FL14 or continuous log_tc) and the Imhof-style bid- distribution features achieves the highest AUC: 0.955 [0.943, 0.967] with FL14 + Imhof, and 0.962 [0.954, 0.969] with continuous + Imhof (AN-010).
The two layers are complementary, not substitutes:
- Imhof seven-feature pipeline alone: AUC 0.888 [0.865, 0.911].
- FL14 binary alone: AUC 0.921 [0.914, 0.928].
- Continuous log_tc alone: AUC 0.884 [0.860, 0.908].
- Joint: AUC 0.955–0.962 — gain of ~0.05–0.07 over each layer individually.
The joint score is the full-observability upper bound: what an agency could achieve if it had already opened the bid layer for every firm. It is the right benchmark for the gatekeeping comparison in Award-layer gatekeeping cuts the bid-microdata pool by 83%: "how much do we give up by triaging before recovering bid data?"
Note also that the Imhof CV-only specification (the pure bid-distribution moments) achieves AUC = 0.585 [0.553, 0.616] — close to chance. The Imhof pipeline reaches 0.888 only because it includes participation features that are correlated with FL. The award-layer signal is therefore not redundant; it is necessary for the bid- distribution pipeline to reach its headline AUC in the first place.
Caveat. The complementarity result is sample-specific (BEC 2009– 2019, CADE-adjudicated cobidder labels, pool of 16,779 firms with both award and bid features available). The increment over the bid-only score depends on the Imhof feature set chosen as benchmark; the paper uses the seven-feature canonical pipeline \citep{imhof2018screening,imhof2019detecting,wallimann2023machine}. The reading is 🟡 because the upper-bound claim is restricted to the available adjudication target.
Sources.
- Own analysis: AN-010 (Imhof benchmark + joint), AN-011 (horse race continuous), AN-015 (D1 harmonized same-sample), AN-033 (formal DeLong incremental: Imhof + FL Δ = +0.096, p = 1.2 × 10⁻²⁶; FL marginal beyond TC = +0.003), AN-034 (sequential envelope demonstrates complementarity at operational level).
- Cross-refs: H:award-bid-complementarity; docs/results.md.
- Macros:
\valImhofFull(0.888),\valImhofFLBin(0.921),\valImhofComboBin(0.955),\valImhofComboCont(0.962),\valAUCImhofCV(0.585),\valImhofPoolN(16,779). - Validation: backing scripts
scripts/31_imhof_full_pipeline.R,scripts/49_imhof_incremental_value.R,scripts/34_horse_race_fl_continuous.R,scripts/36_gate_d1_harmonized.R.