Provisional / methodological — signs analytic; magnitude a synthetic surface (one anchor per platform), not a fitted curve; no closed-form
This note characterizes why the raw award-screen AUC inflates and gives a portable
leading-order sufficient statistic for the size of the inflation. It is the paper's
lead contribution in v25, promoted into the main body (Proposition in §4, proof in
appendix, full grids in the new Online Supplement). The two comparative-static signs
are analytic (proved in the rare-target regime, prop:inflation); the magnitude is
read from a synthetic surface anchored at one empirical point per platform — not an
estimated/fitted curve — and no closed-form inflation formula is claimed. The surface
uses only the published participation-volume distribution shape and base rates; no
confidential microdata enters any output. Where this page conflicts with the
paper or the changelog, the paper wins.
AN-044: The over-crediting bias as a characterized object — the paper's lead contribution (size-bias, CV-of-\(T\) leading-order sufficient statistic)¶
Intuition (plain-language)
The headline number — a raw screen that "ranks cobidders at 0.76" — looks like discrimination until you ask a plain question: would a firm that simply bids in more tenders land near a cartel more often, regardless of how it bids? It would, mechanically. The screen ranks firms by how often they lose, which tracks how often they participate; and a "cobidder" is, by construction, a firm that shared at least one tender-item with a defendant — which is also more likely the more it participates. So the score and the label are both functions of participation volume. The positive class is just the high-volume firms, reshuffled — the raw AUC is the probability that a (volume-weighted) positive firm participated more than a random negative one. That ordering exists before any conduct enters. The useful part is that this bias is predictable and measurable: it grows when firms differ a lot in how much they bid (high dispersion) and shrinks when the target is less rare. You can compute one number from the candidate pool alone — the coefficient of variation of participation volume — and read off, before opening any expensive bid file, how much of any raw AUC is just volume geometry. It is a diagnostic, not a fix: it tells you how inflated the screen is, not how to rescue it. And it explains why BEC and federal ComprasNet are two points on one curve, not the same null twice: federal's much rarer target makes the bias more complete, which is exactly why its exposure-only ranking edges out the raw score.
Question¶
The validation audits (AN-004, AN-027, AN-043) establish empirically that the raw award-screen AUC is mostly mechanical opportunity, not discrimination. This note asks the next question: can the over-crediting be characterized as a body-level object — a bias with a known mechanism, known comparative statics, and a leading-order sufficient statistic a practitioner can compute before committing forensic resources? In v25 the answer is the paper's lead contribution, promoted into the main body (Proposition in §4, proof in appendix, full grids in the new Online Supplement).
Design¶
- Mechanism. The award score is monotone in participation volume \(T\) (it is \(\log(1+T)\)). The contact-defined cobidder label is mechanically increasing in \(T\): \(\Pr(\text{cobidder}_i = 1 \mid T_i) = 1 - (1-\pi)^{T_i} \approx \pi T_i\) in the rare-target regime, where \(\pi\) is the per-tender-item defendant-contact rate. So both score and label load on \(T\): the positive class is the size-biased \(T\) distribution and the negative class is approximately \(F_T\), giving \(\text{AUC}_{\text{raw}} = \Pr(T_i > T_j)\) with \(T_i\) size-biased relative to a random negative \(T_j\) — a classic size-bias gap.
- Two signs (proved in the rare-target regime,
prop:inflation). (a) The size-bias gap is increasing in the dispersion of \(F_T\); for participation counts (Gamma / Poisson-mixed), \(\text{AUC}_{\text{raw}}\) increases in the coefficient of variation \(\text{CV}(T)\), and in the degenerate (no-dispersion) limit \(\text{AUC}_{\text{raw}} \to 0.5\) — there is nothing to rank. (b) As the base rate rises, the size-biasing saturates (\(1-(1-\pi)^T\) flattens) and \(\text{AUC}_{\text{raw}}\) declines toward the within-stratum (genuine-signal) value — so a lower base rate means more inflation. - Synthetic surface (magnitude — one empirical anchor per platform, not a fitted curve). Draw \(T_i\) from a family indexed by \(\text{CV}(T)\); generate \(y_i \sim \text{Bernoulli}(\min(\pi T_i, 1))\) with \(\pi\) swept to vary the base rate; with no injected within-stratum signal (\(\theta = 0\)) the opportunity-adjusted AUC sits at \(\approx 0.5\), so \(\Delta = \text{AUC}_{\text{raw}} - \text{AUC}_{\text{within}}\) is the pure size-bias inflation. The grid is a synthetic surface, not an estimated/fitted curve; each real platform contributes one empirical anchor point overlaid on it. The injection device reuses the existing permutation-power machinery. The two real platforms are overlaid using only the candidate pool's \(\text{CV}(T)\) and base rate — two scalars each, no microdata.
Results¶
The inflation law (signs) holds across the grid¶
| Comparative static | Predicted sign | Simulation |
|---|---|---|
| \(\partial \Delta / \partial\, \text{CV}(T)\) | \(> 0\) (more dispersion ⇒ bigger size-bias gap) | confirmed across the grid |
| \(\partial \Delta / \partial\, \text{base rate}\) | \(< 0\) (rarer target ⇒ more inflation) | confirmed across the grid |
| Degenerate \(T\) (no dispersion) | \(\text{AUC}_{\text{raw}} \to 0.5\) | confirmed (Δ → ≈ 0.02 at CV = 0.5) |
At the BEC base rate (\(\approx 3.9\%\)), the simulated raw AUC rises monotonically with \(\text{CV}(T)\) — from \(\approx 0.52\) at CV = 0.5 to \(\approx 0.84\) at CV = 5 — while the opportunity-adjusted AUC stays pinned near \(0.5\) throughout.
Both real platforms land on the predicted surface¶
| Quantity | BEC–SP | Federal ComprasNet | Predicted relation |
|---|---|---|---|
| \(\text{CV}(T)\) (always-loser pool) | 2.79 | 3.79 | federal more dispersed |
| Adjudicated base rate (AL-pool) | ≈ 3.9% | ≈ 0.5% | federal ≈ 10× rarer |
| Raw award AUC (locked) | 0.761 | 0.744 | comparable |
| Within-stratum AUC (genuine floor, locked) | 0.471 | 0.462 | both ≈ chance |
| Inflation \(\Delta\) (raw − within, locked) | ≈ 0.29 | ≈ 0.28 | comparable |
| Simulated raw AUC at the platform's \((\text{CV}, \text{base rate})\) | 0.768 | 0.808 | reproduces / brackets the locked raw |
| Exposure-only AUC (locked) | 0.713 | 0.754 | federal exposure-only ≥ raw |
The synthetic surface reproduces BEC's locked raw AUC (sim 0.768 vs locked 0.761) at
BEC's anchor coordinate, and both platforms sit on the predicted surface. The surface is
synthetic — anchored at one empirical point per platform, not a fitted/estimated curve. The
figure overlays both real anchor points (outputs/figures/fig_inflation_surface.pdf).
Why federal's exposure-only edges out raw — the lower-base-rate prediction
Federal's roughly 10× lower base rate makes the size-biasing more complete, so the inflation is, if anything, more of the raw number federally than on BEC. That is exactly why the federal exposure-only ranking (0.754), which never sees the score, edges out the raw score itself (0.744) — whereas on BEC the exposure axis only approaches the raw score. BEC and federal are therefore two coordinates on one synthetic inflation surface (anchored at one empirical point per platform, not a fitted curve) at different \((\text{CV}(T), \text{base rate})\) values, not "the same null twice." This is the generalizing content the characterization supplies.
Interpretation¶
The over-crediting is a characterized object — and the paper's lead contribution. The raw award-screen AUC inflates because score and contact-defined label are both monotone in participation volume, making the positive class the size-biased volume distribution. In v25 this is promoted into the main body (Proposition in §4, proof in appendix, full grids in the new Online Supplement) as the paper's lead contribution. The two comparative-static signs are analytic in the rare-target regime; the magnitude is rendered as a synthetic surface anchored at one empirical point per platform — not a fitted/estimated curve, and not asserted as a closed-form formula.
The leading-order sufficient statistic is portable — and is a diagnostic, not a fix. A practitioner can compute the candidate pool's \(\text{CV}(T)\) from award data alone, before opening any bid file, to bound how much of an uncorrected raw AUC is mechanical. This converts the paper's central caveat ("adjust for procurement opportunity or you over-credit the screen") into a portable diagnostic. It does not rescue the screen: the genuine within-stratum floor is \(\approx\) chance on both platforms either way, and the correction reveals that floor rather than lifting it.
What is full-power vs power-bounded. The size-bias / label-blind first-stage deflation is the part that replicates at full power across both platforms; the residual within-stratum floor (\(\approx\) chance) is power-bounded federally (≤ 0.55 unadjudicated at \(N^+ = 195\)). The two are kept separate: the inflation characterization is the robust part, the residual-floor null is the power-bounded part.
Scope of the claim. This characterizes the over-crediting mechanism — a statement about how a raw validation number is produced — which is the exposure-discipline (H3) hypothesis that is now Confirmed across two platforms. It does not promote the screen to a portable detection tool: the cross-platform detection umbrella stays provisional, and the federal CADE anchors partially overlap the BEC portfolio, so this tests portability of the inflation law, not an independent cartel test. Confidence is 🟡 yellow / provisional: the object is clean and the signs are proved, but the magnitude rests on a synthetic surface (one empirical anchor per platform, not a fitted curve) and the federal anchors partially overlap.
Bearing on hypotheses¶
- H:exposure-discipline (H3). Primary bearing — characterizes the over-crediting mechanism that H3 asserts (Confirmed across two platforms), turning the empirical null into a body-level characterized object with a leading-order sufficient statistic. Provisional/methodological support; does not change the H3 status (stays Confirmed).
- H:cobidder-concentration (H1). Explains why the raw concentration is generic co-participation arithmetic: the positive class is the size-biased volume distribution, so the raw AUC exists before any conduct.
Follow-ups¶
- A genuinely independent cartel anchor (non-overlapping cases, or another jurisdiction) would add a third empirical anchor on the synthetic inflation surface and test the law out of sample.
- Reporting the \(\text{CV}(T)\) leading-order sufficient statistic alongside any future award-screen validation as a standard pre-registration diagnostic.