Replication¶

This page describes how to replicate the current results. The active version of the paper is v10-causal-mechanism (JPubE short paper, May 2026): Sourcing under Sanctions: Judicial Urgency and Pharmaceutical Procurement Costs.

Repository Structure¶

paper1-bitter-pills/
├── v10-causal-mechanism/             # ACTIVE (JPubE short paper)
│   ├── analysis/               # R + Python scripts (40_… 54_…) + _macros.R
│   ├── manuscript/paper/       # main.tex, OnlineAppendix.tex, *.tex sections, values.tex
│   ├── output/figures/         # vector PDFs (sourcing-vs-pricing, event study, …)
│   ├── output/tables/          # generated .tex / .csv tables
│   ├── build_v10.sh             # regenerate outputs + compile main + appendix
│   └── V9_CHANGELOG.md         # detailed build and revision log
├── v8-sourcing-reframe/        # frozen — earlier sourcing reframe
├── v7-r2round1/                # frozen — referee R2 round 1 baseline
├── v4/                         # legacy R pipeline (prepares the input data cache)
├── docs/                       # MkDocs site source (this site)
└── deploy.sh                   # build site + push to darciogm.github.io

The v4 pipeline prepares the input data cache (/tmp/v4_prepared.rds) used by the v10 analysis scripts; v10 builds analysis on top of that prepared dataset rather than re-deriving it from raw BEC.

Software Requirements¶

Primary Analysis (R + Python)¶

Package	Purpose
`R` 4.5+	Statistical computing
`fixest`	High-dimensional fixed-effects estimation (`lean=TRUE` default)
`data.table`	Fast data manipulation
`ggplot2`	Publication figures (grayscale, serif)
`arrow` / `duckdb`	Parquet I/O (DuckDB is the default engine for parquet)
`Python` 3 + `duckdb` / `pyarrow`	Classifier macros and presentation tables

Lee trimming bounds and the Rademacher wild-cluster bootstrap are implemented manually in the v10 scripts. HonestDiD requires CVXR/clarabel system deps; where unavailable, the Honest-DiD sensitivity is computed as a manual linear-extrapolation fallback that produces the same diagnostic verdict.

Manuscript¶

Tool	Purpose
TeX Live 2024+	LaTeX typesetting
`elsarticle`	Journal class (review format)
`natbib` + `bibtex`	Bibliography (NOT biblatex/biber in this project)
`booktabs` + `threeparttable`	Tables

Data Sources¶

BEC-G65 — bid-level pharmaceutical procurement on the São Paulo state electronic procurement platform.

Feature	Description
Source	Bolsa Eletrônica de Compras (BEC), São Paulo state
Coverage	Pharmaceutical purchases (BEC Group 65)
Period	January 2009 – December 2019
Observations	479,330 purchase-offer-item observations
Regimes	Ordinary; Administrative urgent; Litigated urgent

The classifier operates at the purchase-order/tender-notice level; the empirical analysis is at the purchase-offer-item level after classified regimes are linked to BEC item records. Price regressions use accepted winning bids. The processed dataset is included in the replication package; raw BEC data is publicly available through the São Paulo state transparency portal.

Selection and the Lee bounds¶

The administrative urgent channel is selected and larger — the closest feasible urgent-procurement comparison, not a randomized one. The Lee trimming bounds discipline that selection: within item × year × PBU strata the overrepresented administrative group is trimmed from the high and low tails of its price distribution, producing lower and upper bounds for the litigated-over- administrative gap under a monotonicity restriction. A parametric (Heckman-type) selection correction is non-informative in this design and is reported only as a diagnostic.

Pipeline¶

The v10 analysis runs on top of the v4 prepared cache. The numbered scripts emit macros into manuscript/paper/values.tex, which the LaTeX manuscript reads.

# 1. Prepare input data (one-off; ~1 minute)
Rscript v4/analysis/00_prepare_data.R

# 2. Regenerate outputs and compile (one command)
./v10-causal-mechanism/build_v10.sh

build_v10.sh runs the analysis scripts — among them 40_utg_lee_bounds.R (Lee bounds), 43_rambachan_roth.R (BJS event study + Honest-DiD), 44_wild_bootstrap.R (Rademacher wild-cluster bootstrap), 45_reconciliation.R (pricing-vs-sourcing decomposition), 46_procurement_cost_bound.R (fiscal procurement-cost calculation), 48_mechanism_evidence.R (within firm-buyer-item pricing, winner switching, aggregation), and the Python classifier/table layer (49_classifier_macros.py, 50_v9_outputs.py, 54_sample_flow_diagnostics.py) — checks required outputs, and compiles main.pdf and OnlineAppendix.pdf.

Each numbered script emits a block into values.tex (delimited by auto-markers). The manuscript reads those macros — every numerical claim, table input, and figure path is regenerated by the script that owns it. No hardcoded numerals in the manuscript.

Output Files¶

Path	Content
`v10-causal-mechanism/output/figures/`	Vector PDFs: pricing-vs-sourcing decomposition, BJS event study, Honest-DiD sensitivity, quantity-ratio density
`v10-causal-mechanism/output/tables/`	Generated `.tex` tables: combined urgent-margins-and-Lee-bounds, within firm-buyer-item robustness, winner switching, placebo, dynamic sensitivity, procurement cost, classifier validation, sample construction, and more
`v10-causal-mechanism/manuscript/paper/main.pdf`	Compiled main paper (17 pp, JPubE short-paper review format)
`v10-causal-mechanism/manuscript/paper/OnlineAppendix.pdf`	Compiled Online Appendix (5 pp)

Computational Environment¶

The analysis was developed and tested on DarcioWork (a WSL2 development workstation):

Component	Specification
OS	Ubuntu (WSL2 on Windows)
CPU	Intel i7-1260P (12 cores / 14 threads visible to WSL2)
RAM	21 GB
GPU	None (CPU-only)
R	4.5
`fixest` threads	`setFixest_nthreads(12)`
DuckDB threads	`PRAGMA threads=12; PRAGMA memory_limit='14GB'`

Reproducibility

Scripts that draw random numbers (bootstrap) set explicit seeds. Re-running the pipeline produces identical values.tex macro blocks, and a LaTeX-only rebuild reproduces both PDFs without changing any estimate.