Replication¶

This page describes how to replicate the results presented in the paper.

Replication Package¶

The full replication package includes all code, processed datasets, and manuscript source files needed to reproduce every table and figure in the paper.

Repository Structure

The replication materials are organized in version-specific directories. The v4 directory contains the current primary analysis in R, while v2 and v3 contain legacy Stata implementations.

Software Requirements¶

Primary Analysis (v4 -- R)¶

Software	Version	Purpose
R	4.5+	Statistical computing
`fixest`	latest	High-dimensional fixed effects estimation
`data.table`	latest	Fast data manipulation
`modelsummary`	latest	Regression tables
`ggplot2`	latest	Publication-quality figures
`arrow`	latest	Reading Parquet files

Additional R packages: kableExtra, sandwich, lmtest, broom, scales, sf, viridis.

Legacy Analysis (v2/v3 -- Stata)¶

Software	Version	Purpose
Stata/SE	17+	Statistical computing
`reghdfe`	latest	High-dimensional fixed effects
`ftools`	latest	Fast Stata tools

Manuscript¶

Software	Version	Purpose
LaTeX	TeX Live 2024+	Document typesetting
`elsarticle`	latest	Journal document class
`chicago`	latest	Bibliography style

Data Sources¶

Primary Dataset¶

BEC-G65-WORK1.parquet

Source: Bolsa Eletronica de Compras (BEC), Sao Paulo state electronic procurement platform
Format: Apache Parquet (~125 MB)
Observations: 479,330 bids
Variables: 180 columns
Coverage: All bids for BEC Group 65 (medical, dental, and hospital supplies), January 2009 -- December 2019
Unit: Bid level (firm x item x procurement event)

Data Access

The BEC procurement data is publicly available through the Sao Paulo state transparency portal. The processed dataset used in the analysis is included in the replication package.

Key Variables¶

Variable	Description
`purchase_type`	0 = Ordinary, 1 = Administrative, 2 = Litigated
`bid_price_ref`	Reference price (maximum the government will pay), in BRL
`bid_price`	Negotiated (final) bid price, in BRL
`bid_qty`	Quantity demanded in tender notice
`n_firms_bids`	Number of distinct firms submitting bids
`po_item_winner`	Tender success indicator (1 = successful purchase)
`item_id`	Product identifier (used for item FE)
`pbu_code`	Public buyer unit code (used for PBU FE and clustering)
`year_n`	Year (used for time FE)

A full data dictionary is available in the replication package (DATA_DICTIONARY.md).

Running the Analysis¶

Quick Start (v4 -- R)¶

# 1. Clone the repository and navigate to the project
cd paper1-bitter-pills

# 2. Run the full v4 pipeline (~5 minutes on 16 cores)
Rscript v4/run_all.R

This single command executes all analysis scripts in sequence:

00_prepare_data.R -- Load and prepare the dataset (cached at /tmp/v4_prepared.rds)
01_desc_stats.R -- Descriptive statistics
02_balance_table.R -- Balance table for urgent subsample
03_main_regressions.R -- Main regression tables (4 FE specs x 3 clustering variants)
04_heterogeneity.R -- Heterogeneous effects analysis
05_fiscal_costs.R -- Fiscal cost estimates
06_robustness.R -- Robustness checks (120+ regressions)
07_graphs.R -- Generate all figures
08_pub_tables.R -- Publication-ready LaTeX tables
09_pub_figures.R -- Publication-ready PDF figures

Running Individual Scripts¶

# Must run data preparation first
Rscript v4/analysis/00_prepare_data.R

# Then any individual analysis script
Rscript v4/analysis/03_main_regressions.R
Rscript v4/analysis/07_graphs.R

Legacy Analysis (Stata)¶

# Install required Stata packages
stata-se -b -q do v2/analysis/install_packages.do

# Run the v3 pipeline
bash v3/analysis/run_all.sh

Output Files¶

Tables¶

Directory	Format	Count	Description
`v4/pub/tables/`	LaTeX (.tex)	17	Publication-ready tables (booktabs + threeparttable)
`v4/manuscript/`	LaTeX (.tex)	59	Full regression tables (tabularray format)
`v4/results/`	HTML (.html)	59	Browser-viewable tables

Figures¶

Directory	Format	Count	Description
`v4/pub/figures/`	PDF	8	Publication-ready figures (grayscale, 6.5 x 4 in)
`v4/graphs/`	PDF	8	Color figures for presentations

Manuscript¶

# Compile the manuscript
cd v5/manuscript/paper
pdflatex -interaction=nonstopmode main.tex
bibtex main
pdflatex main.tex
pdflatex main.tex

Computational Environment¶

The analysis was developed and tested on the following system:

Component	Specification
OS	Ubuntu 24.04 (WSL2 on Windows)
CPU	16 cores
RAM	15 GB
R	4.5
fixest	Uses OpenMP for parallel estimation (16 threads)

Runtime

The full v4 pipeline (run_all.R) takes approximately 5 minutes on the reference system. The most time-intensive step is the robustness analysis (06_robustness.R), which estimates 120+ regressions across three winsorization levels.