Main Results¶
Intuition (plain-language)
Read these results as answers to one practical question: can a signal a regulator already has for free — which firms keep losing — make expensive cartel forensics cheaper? Yes. The persistent-loser ranking separates cartel-linked cover bidders from ordinary losers well enough (AUC ≈ 0.86 out of sample) that using it as a first-stage filter cuts the costly bid-microdata pool by ~83% while still catching about two-thirds of known cobidders. It complements, rather than duplicates, the classic bid-distribution screens. The price results come last and on purpose: they describe where the signal is active, not how much any cartel overcharged.
This page presents the empirical findings in the order the paper now privileges: discrimination first, architectural complementarity second, pricing imprint third (descriptive corroboration only).
1. Discrimination Against the Cobidder Population¶
The screen's primary validation is firm-level discrimination of cobidders inside the always-loser stratum.
Headline AUC¶
| Metric | Value |
|---|---|
| Firm-level AUC, temporal holdout (train 2009–2016, test 2017–2019) | 0.864 [95% CI: 0.858, 0.870] |
| Firm-level AUC, in-sample (binary FL14 flag) | 0.924 [95% CI: 0.921, 0.926] |
| Firm-level AUC, in-sample (continuous log participation) | 0.939 |
| AUC, strict pre-2020 benchmark with participation-stratified permutation null | corroborated by 3.2× excess over random-matching baseline (\(p < 0.001\)) |
| AUC against direct CADE defendants in broader BEC firm universe | ≈ 0.49 (random — by design; loser-side scope, not winner-side identity) |
AUC 0.864 under temporal holdout
The temporal-holdout AUC is the headline reference for all operational claims. The construct is not retrofit on post-2020 data; the score is computed using only 2009–2016 participation and evaluated on items in 2017–2019. Out-of-fold CV at cobidder-firm level returns 0.891 [0.887, 0.894], confirming the structural component dominates.
Leakage audit (decomposition of in-sample item-level AUC)¶
The raw in-sample item-level AUC of 0.995 is partly tautological by construction. The audit decomposes it into structural and leakage components:
| Audit | Specification | AUC | 95% CI |
|---|---|---|---|
| Original | In-sample, full pool | 0.995 | [0.995, 0.995] |
| Audit 1 — scope | Same score, swapped to direct-CADE label | 0.506 | [0.505, 0.507] |
| Audit 2 — tautology | 5-fold CV at cobidder-firm level | 0.891 | [0.887, 0.894] |
| Audit 3 — generalization | Train 2009–2016 → test 2017–2019, any-cobidder | 0.864 | [0.859, 0.870] |
| Audit 3 — direct-CADE label | Same temporal split, against direct-CADE label | 0.511 | [0.510, 0.513] |
The 0.10–0.13 drop from in-sample to audit-corrected AUC is the pure-leakage component. Audits against direct-CADE labels return random AUC, consistent with the loser-side scope.
Formal sham permutation: 32 σ above the volume-only null¶
A formal version of the volume placebo permutes the FL14 label across always-losers while preserving the marginal participation distribution. Across B = 2,000 sham draws, the sham AUC distribution has mean 0.500, SD 0.013, and 99th percentile 0.531. The observed FL14 AUC of 0.924 falls ~32 standard deviations above the sham mean; permutation p < 1/2,000 (decisively rejects the volume-only null at the 99% level). The price coefficient under the same sham does not reject the null (observed +0.064 vs sham mean +0.144, p = 0.989) — the second piece of the scope-not-damages reading.
Three-classifier timing battery: discrimination preserved under strict ex ante training¶
A stricter version of the temporal holdout trains on progressively earlier windows and evaluates against cobidders linked to CADE adjudications closed only after the training window. Even at the strictest horizon — clf_2015 trained on 2009–2015 only, evaluated against cobid_post2019 (truly out-of-time, by construction unavailable in the training data) — the firm-level FL AUC is 0.786 and the continuous AUC is 0.854. clf_2017 returns 0.844 / 0.894 against the same target. Firm persistence between the early and late panel windows is 8.7%: roughly nine in ten test-window firms were not in the training pool. The temporal holdout therefore evaluates a substantially fresh firm population, not the same firms in different years.
Why direct-CADE AUC ≈ 0.49 is the design's empirical signature¶
The flag recovers loser-side participation, not winner-side identity. Direct CADE defendants are by construction the winner side of the same arrangements. The asymmetry between cobidder discrimination (0.864) and direct-defendant discrimination (≈ 0.49) is not a failure of validation but the empirical fingerprint of what a screen built on the loser side can and cannot do.
The stronger version of this scope check uses raw participation count against direct CADE defendants in the full BEC firm universe: AUC = 0.383, which is below random. A high-participation BEC firm is less likely, not more likely, to be a direct CADE defendant under the loser-side scoring logic. The score does not merely fail to identify winner-heavy defendants; the relevant direction is reversed.
2. Architecture: Award-Layer Triage Before Bid-Layer Forensics¶
This is the paper's headline contribution.
Sequential gatekeeper rule¶
| Quantity | Without architecture | With award-layer triage | Improvement |
|---|---|---|---|
| Firms entering forensic stage | 11,676 (all always-losers) | 2,000 (top-\(K_1\) award-ranked survivor pool) | 83% reduction in bid-microdata pool |
| Cobidders recovered (of 193 adjudicated) | 193 (full coverage) | 131 | 68% recall preserved |
| Bid-microdata interrogation cost | full | ~17% of full | ~83% saving |
83% footprint reduction
Routing forensic interrogation through the screen reduces the bid-microdata pool the forensic stage must work on by 83% while still recovering 131 of 193 adjudicated cobidders. The recall robustness survives the temporal-holdout audit.
Complementarity with bid-distribution screens¶
Against the seven-feature Imhof–Wallimann bid-distribution pipeline trained on the forensic-recoverable layer:
| Model (same five-fold CV sample, \(N = 16{,}779\)) | AUC | 95% CI |
|---|---|---|
| Frequent-loser flag (alone) | 0.921 | [0.914, 0.928] |
| Participation intensity (continuous, alone) | 0.884 | [0.860, 0.908] |
| Imhof–Wallimann pipeline (alone) | 0.888 | [0.865, 0.911] |
| Award layer + bid layer (combined) | 0.962 | [0.954, 0.969] |
The frequent-loser flag adds +0.035 AUC over the Imhof–Wallimann benchmark alone (DeLong \(p = 0.014\)); the combined specification gains +0.096 to +0.098 AUC over the bid-layer benchmark (\(p < 0.001\)). The increment is not an outperformance claim — it shows the two layers are not measuring the same thing.
Why +0.035 and not 0.921 − 0.888 = 0.033
The DeLong test requires paired observations on identical firms, so the +0.035 increment is computed on the stricter same-sample subset with complete bid-layer features (\(N = 11{,}676\): Imhof full 0.846, FL 0.881). The table levels above are the larger 16,779-firm pool, whose raw gap is 0.033. The two differ only because the samples differ; the DeLong-valid number is +0.035.
Informational complements, not substitutes
The two screens carry non-redundant information about the same target. The screening stage functions as a credible Stage-1 gatekeeper for the forensic stage. Reforms that mandate operational bid-microdata archival expand the forensic stage, not the screening one.
Operational metrics under temporal holdout¶
The temporal-holdout column is the headline reference (in-sample reported only for transparency):
| Top-\(k\) | Holdout TP | Holdout Precision | Holdout Recall | Holdout Lift | In-sample Lift |
|---|---|---|---|---|---|
| 50 | 1 | 0.020 | 0.005 | 1.7× | 26.2× |
| 100 | 7 | 0.070 | 0.036 | 6.1× | 14.8× |
| 250 | 19 | 0.076 | 0.098 | 6.6× | 14.0× |
| 500 | 35 | 0.070 | 0.181 | 6.1× | 11.5× |
| 1,000 | 66 | 0.066 | 0.342 | 5.8× | 8.5× |
The in-sample columns over-state precision by approximately 50% at top-500 because roughly 47% of the in-sample top-500 ranking comes from 2017–2019 participation, after CADE adjudications were already underway for some cartels. All operational claims in the paper are read off the holdout column.
3. Within-Stratum Bridge: How Cobidders Behave¶
The cobidder population behaves like the modeled cover-bidder type along five firm-level and bid-level predictions.
Firm-level (4 of 5 predictions confirmed)¶
| Prediction | Cohen's \(d\) | Interpretation |
|---|---|---|
| Cobidders deploy at higher intensity than FL non-cobidders | \(d = 0.97\) on unique winners faced | Confirmed |
| Cobidders bid proximally to direct CADE defendants | \(d = 1.49\) | Confirmed |
| Cobidders concentrate in narrower markets (portfolio HHI) | \(d = 0.61\) | Confirmed |
| Cobidders show repeat-pair structure with adjudicated firms | excess persistent pairs \(p < 0.001\) | Confirmed |
| Cobidders' tenure distribution distinguishes from FL non-cobidders | small effect | Mixed |
Bid-level (signature of credible cover bidding)¶
At the bid level, cobidders bid plausibly close to winners with elevated within-firm cross-bid dispersion:
| Statistic | Cobidders | FL non-cobidders | Gap (Cohen's \(d\)) |
|---|---|---|---|
| Median per-firm gap to winner (log) | 0.582 | 0.809 | \(d = -0.281\) (\(p < 10^{-6}\)) |
| Per-firm bid SD | 1.207 | 1.099 | \(d = +0.147\) |
A multivariate logit holding \(\log(1+\text{tenders\_count})\) constant confirms both signs at \(p < 10^{-3}\). The directions are consistent with credible cover bidding (Marshall & Marx 2012, Asker 2010) — bids low enough to be plausible, dispersed enough to track rotation across cover roles — rather than the textbook deliberately-uncompetitive cover bid the early literature described.
4. Pricing Imprint as Descriptive Corroboration¶
The paper does not rest on this section. The screening contribution is carried by Sections 1 and 2 above. The pricing imprint is reported because the cover-bidding literature would expect a price footprint, but the available identification cannot settle whether the association is causal.
Broad-sample association¶
| Specification | Coefficient | SE | Effect (%) | \(N\) |
|---|---|---|---|---|
| (1) Item + Year FE | 0.068*** | (0.022) | +6.8% | 1,654,401 |
| (2) Item + Year + PBU FE | 0.064*** | (0.020) | +6.4% | 1,654,401 |
| (3) Pregão only (all FE) | 0.089*** | (0.025) | +9.3% | 1,334,729 |
| (4) Convite only (all FE) | 0.037 | (0.022) | +3.8% | 319,718 |
Cross-fit (0.036), CEM matching (0.077), IPW (0.055), and a leave-one-out IV bound the conditional range at +3.6% to +7.7%.
Sign reversal under overlap restriction¶
Restricting comparisons to cells where FL-present and -absent items genuinely overlap on observables and reweighting to the average treatment effect on the treated produces a reversed coefficient: overlap-cell ATT \(-0.097\) (\(p < 0.001\)); the negative sign is robust to propensity-score trimming (full coefficients in Online Appendix B). The reversal is real; the paper does not rest on either sign.
The within-quartile decomposition:
| Quintile | Broad-sample \(\beta\) | \(p\)-value | ATT |
|---|---|---|---|
| Q1 | \(-0.065\) | \(< 10^{-15}\) | similar |
| Q2 | \(-0.057\) | \(< 10^{-15}\) | similar |
| Q3 | \(-0.040\) | \(< 10^{-15}\) | similar |
| Q4 (largest tender value) | +0.046 | 0.012 | +0.041 (\(p = 0.045\)) |
Q4 — the segment where deployment value is highest a priori on contract size — carries the positive imprint that the framework predicts the cartel concentrates on. Q1–Q3 carry negative imprints consistent with several non-exclusive non-screening mechanisms (thin-market price formation, item-characteristic selection, composition heterogeneity in the firm pool).
The discrimination evidence (Section 1) and the architectural test (Section 2) carry the contribution; the broad-sample imprint is bracketed by Cinelli–Hazlett \(RV_{q=1} = 17.5\%\) and Oster \(\hat{\delta}\) against selection on observables already absorbed by the fixed effects.
Why this section is descriptive
The screening-value formalization that motivates why the broad-sample \(\beta\) remains an economic object under coarsened observability is in Online Appendix A (Proposition 3); the body of the paper does not lean on that formalization. The screen is validated in Sections 1–2; the price imprint here is corroboration that the construct also tracks an outcome the cover-bidding literature would expect, not identification of a causal effect.
5. Heterogeneity Across Detection Regimes¶
Where the screening signal varies tells us something the level does not. Splitting the sample into quartiles of procuring-unit size:
| Quartile | FL price coefficient | Interpretation |
|---|---|---|
| Q1 (smallest buyers, weakest oversight) | +21.4% | Strongest signal |
| Q2 | +9.8% | |
| Q3 | +4.5% | |
| Q4 (largest buyers, strongest oversight) | +1.7% | Weakest signal |
A 12.5× extreme-quartile gradient. The framework predicts the contrast direction through the detection-cost comparative static \(\partial m^*/\partial \theta_k < 0\): cover-bidder deployment is more aggressive where the principal cost of detection is lower.
Scope information, not institutional channel identification
Buyer size proxies for several correlated institutional features — procurement-officer tenure (Coviello & Mariniello 2014), internal-audit infrastructure, item composition, discretionary procedure use (Decarolis et al. 2025) — none separable with the variation we have. We read the gradient as heterogeneity in the screening object across the monitoring environment, in the framework's predicted direction, with the magnitude concentrated at the extremes — not as identification of an institutional channel.
6. Modal Asymmetry: Pregão vs. Convite (Scope Information)¶
On the discrimination side, the modal-by-modal AUC against the 193 adjudicated cobidders is 0.952 in pregão primary auctions versus 0.816 in convite primary auctions (bootstrap difference \(-0.136\), \(p \approx 0\)).
Scope information for the screen, not a positive test of any institutional mechanism
The convite minimum-bidder rule (Lei 8.666/93, Art. 22 §3) would predict a sharper screening signal where the rule binds (convite). The data reverse this prediction. We do not read the reversal as a positive test of any alternative institutional theory. The modal asymmetry is reported as scope information for the screening object — the construct discriminates better in pregão environments — and not as a positive test of the minimum-bidder-rule mechanism, which would require institutional variation the BEC setting does not deliver.
Summary: What the Empirical Strategy Establishes¶
The paper makes four empirical claims, in declining order of confidence:
- The architecture works. 83% reduction in forensic pool with 68% recall preserved. (Headline.)
- The screen discriminates. Firm-level AUC 0.864 under temporal holdout against adjudicated cobidders.
- The screen complements bid-distribution methods. +0.035 AUC over Imhof–Wallimann pipeline (\(p = 0.014\)).
- A pricing imprint is present in the broad sample. +3.6% to +7.7% across four estimators, with sign reversal under overlap and Q4-concentrated positive — reported descriptively, not causally.
Claims 1–3 are the paper. Claim 4 is corroboration.