Replication¶
This page describes how to replicate the current results. The active version of the paper is v10-causal-mechanism (JPubE short paper, May 2026): Sourcing under Sanctions: Judicial Urgency and Pharmaceutical Procurement Costs.
Repository Structure¶
paper1-bitter-pills/
├── v10-causal-mechanism/ # ACTIVE (JPubE short paper)
│ ├── analysis/ # R + Python scripts (40_… 54_…) + _macros.R
│ ├── manuscript/paper/ # main.tex, OnlineAppendix.tex, *.tex sections, values.tex
│ ├── output/figures/ # vector PDFs (sourcing-vs-pricing, event study, …)
│ ├── output/tables/ # generated .tex / .csv tables
│ ├── build_v10.sh # regenerate outputs + compile main + appendix
│ └── V9_CHANGELOG.md # detailed build and revision log
├── v8-sourcing-reframe/ # frozen — earlier sourcing reframe
├── v7-r2round1/ # frozen — referee R2 round 1 baseline
├── v4/ # legacy R pipeline (prepares the input data cache)
├── docs/ # MkDocs site source (this site)
└── deploy.sh # build site + push to darciogm.github.io
The v4 pipeline prepares the input data cache (/tmp/v4_prepared.rds) used
by the v10 analysis scripts; v10 builds analysis on top of that prepared dataset
rather than re-deriving it from raw BEC.
Software Requirements¶
Primary Analysis (R + Python)¶
| Package | Purpose |
|---|---|
R 4.5+ |
Statistical computing |
fixest |
High-dimensional fixed-effects estimation (lean=TRUE default) |
data.table |
Fast data manipulation |
ggplot2 |
Publication figures (grayscale, serif) |
arrow / duckdb |
Parquet I/O (DuckDB is the default engine for parquet) |
Python 3 + duckdb / pyarrow |
Classifier macros and presentation tables |
Lee trimming bounds and the Rademacher wild-cluster bootstrap are implemented
manually in the v10 scripts. HonestDiD requires CVXR/clarabel system deps;
where unavailable, the Honest-DiD sensitivity is computed as a manual
linear-extrapolation fallback that produces the same diagnostic verdict.
Manuscript¶
| Tool | Purpose |
|---|---|
| TeX Live 2024+ | LaTeX typesetting |
elsarticle |
Journal class (review format) |
natbib + bibtex |
Bibliography (NOT biblatex/biber in this project) |
booktabs + threeparttable |
Tables |
Data Sources¶
BEC-G65 — bid-level pharmaceutical procurement on the São Paulo state electronic procurement platform.
| Feature | Description |
|---|---|
| Source | Bolsa Eletrônica de Compras (BEC), São Paulo state |
| Coverage | Pharmaceutical purchases (BEC Group 65) |
| Period | January 2009 – December 2019 |
| Observations | 479,330 purchase-offer-item observations |
| Regimes | Ordinary; Administrative urgent; Litigated urgent |
The classifier operates at the purchase-order/tender-notice level; the empirical analysis is at the purchase-offer-item level after classified regimes are linked to BEC item records. Price regressions use accepted winning bids. The processed dataset is included in the replication package; raw BEC data is publicly available through the São Paulo state transparency portal.
Selection and the Lee bounds¶
The administrative urgent channel is selected and larger — the closest feasible urgent-procurement comparison, not a randomized one. The Lee trimming bounds discipline that selection: within item × year × PBU strata the overrepresented administrative group is trimmed from the high and low tails of its price distribution, producing lower and upper bounds for the litigated-over- administrative gap under a monotonicity restriction. A parametric (Heckman-type) selection correction is non-informative in this design and is reported only as a diagnostic.
Pipeline¶
The v10 analysis runs on top of the v4 prepared cache. The numbered scripts emit
macros into manuscript/paper/values.tex, which the LaTeX manuscript reads.
# 1. Prepare input data (one-off; ~1 minute)
Rscript v4/analysis/00_prepare_data.R
# 2. Regenerate outputs and compile (one command)
./v10-causal-mechanism/build_v10.sh
build_v10.sh runs the analysis scripts — among them
40_utg_lee_bounds.R (Lee bounds), 43_rambachan_roth.R (BJS event study +
Honest-DiD), 44_wild_bootstrap.R (Rademacher wild-cluster bootstrap),
45_reconciliation.R (pricing-vs-sourcing decomposition),
46_procurement_cost_bound.R (fiscal procurement-cost calculation),
48_mechanism_evidence.R (within firm-buyer-item pricing, winner switching,
aggregation), and the Python classifier/table layer
(49_classifier_macros.py, 50_v9_outputs.py, 54_sample_flow_diagnostics.py)
— checks required outputs, and compiles main.pdf and OnlineAppendix.pdf.
Each numbered script emits a block into values.tex (delimited by auto-markers).
The manuscript reads those macros — every numerical claim, table input, and
figure path is regenerated by the script that owns it. No hardcoded numerals
in the manuscript.
Output Files¶
| Path | Content |
|---|---|
v10-causal-mechanism/output/figures/ |
Vector PDFs: pricing-vs-sourcing decomposition, BJS event study, Honest-DiD sensitivity, quantity-ratio density |
v10-causal-mechanism/output/tables/ |
Generated .tex tables: combined urgent-margins-and-Lee-bounds, within firm-buyer-item robustness, winner switching, placebo, dynamic sensitivity, procurement cost, classifier validation, sample construction, and more |
v10-causal-mechanism/manuscript/paper/main.pdf |
Compiled main paper (17 pp, JPubE short-paper review format) |
v10-causal-mechanism/manuscript/paper/OnlineAppendix.pdf |
Compiled Online Appendix (5 pp) |
Computational Environment¶
The analysis was developed and tested on DarcioWork (a WSL2 development workstation):
| Component | Specification |
|---|---|
| OS | Ubuntu (WSL2 on Windows) |
| CPU | Intel i7-1260P (12 cores / 14 threads visible to WSL2) |
| RAM | 21 GB |
| GPU | None (CPU-only) |
| R | 4.5 |
fixest threads |
setFixest_nthreads(12) |
| DuckDB threads | PRAGMA threads=12; PRAGMA memory_limit='14GB' |
Reproducibility
Scripts that draw random numbers (bootstrap) set explicit seeds. Re-running
the pipeline produces identical values.tex macro blocks, and a
LaTeX-only rebuild reproduces both PDFs without changing any estimate.