Replication¶
v8 banner
This page describes the v1--v4 reduced-form pipeline (scripts/00_master.R). The JPubE submission-ready v8 structural pipeline lives in v7-jpube-tight/scripts/ (canonical BNE simulation 45_bne_simulation.R) and v8-jpube/scripts/ (empirical bidder-count robustness 46_bne_empirical_counts.R). See v8-jpube/manuscript/paper_v8.tex for the canonical paper. Replication materials with code, generated tables, figures, and non-confidential derived outputs are described in the Data availability section of the manuscript; raw BEC administrative records are not publicly redistributable under the research-access agreement.
Requirements¶
| Component | Version |
|---|---|
| R | 4.5+ |
fixest |
0.12+ |
data.table |
1.15+ |
arrow |
14+ |
ggplot2 |
3.5+ |
scales |
1.3+ |
grf |
2.3+ |
quantreg |
5.98+ |
gridExtra |
2.3+ |
Data¶
The primary dataset is administrative data from BEC (Bolsa Eletronica de Compras), the electronic procurement platform for the state of Sao Paulo. The data contains 373 columns covering all standardized goods procurement from January 2016 to December 2019.
Data access
The raw data files are not publicly available due to confidentiality agreements. Researchers interested in replication should contact the authors directly.
Pipeline¶
The full analysis pipeline runs from a single master script:
# From the project root directory
Rscript scripts/00_master.R
This executes the following scripts in sequence, each as a separate R subprocess:
| Script | Purpose | Duration |
|---|---|---|
01_clean.R |
CSV to parquet conversion, variable creation | ~5 min (first run) |
02_analysis.R |
24 DiDiR regressions + 4 event studies | ~30 sec |
05_robustness.R |
Placebo, alt. clustering, winsorization, permutation | ~45 sec |
06_extensions.R |
Real prices, extensive margin, efficiency, heterogeneity | ~30 sec |
07_advanced.R |
HonestDiD, Lee bounds, causal forest, quantile DiD, Gelbach | ~3 min |
03_tables.R |
18 LaTeX tables | ~5 sec |
04_figures.R |
15 PDF figures | ~10 sec |
Output Structure¶
output/
├── tables/ # 18 .tex files (threeparttable + booktabs)
│ ├── tab_desc_stats.tex
│ ├── tab_prices.tex
│ ├── tab_participants.tex
│ ├── tab_validbids.tex
│ ├── tab_distance.tex
│ ├── tab_placebo.tex
│ ├── tab_altcluster.tex
│ ├── tab_winsorize.tex
│ ├── tab_prices_real.tex
│ ├── tab_extensive.tex
│ ├── tab_efficiency.tex
│ ├── tab_sme_winner.tex
│ ├── tab_heterog_pbu.tex
│ ├── tab_heterog_value.tex
│ ├── tab_lee_bounds.tex
│ ├── tab_cforest.tex
│ ├── tab_quantile_did.tex
│ └── tab_mediation.tex
└── figures/ # 15 .pdf files (grayscale, cairo)
├── fig_01_logprices_es.pdf
├── fig_02_distance_es.pdf
├── fig_03_numfirms_es.pdf
├── fig_04_numbids_es.pdf
├── fig_05_trends_prices.pdf
├── fig_06_trends_firms.pdf
├── fig_07_trends_bids.pdf
├── fig_08_trends_distance.pdf
├── fig_09_permutation.pdf
├── fig_10_sme_share.pdf
├── fig_11_honestdid.pdf
├── fig_12_cforest_varimp.pdf
├── fig_13_cforest_gate.pdf
├── fig_14_quantile_did.pdf
└── fig_15_mediation.pdf
Manuscript Compilation¶
cd manuscript
pdflatex main.tex
bibtex main
pdflatex main.tex
pdflatex main.tex
Technical Notes¶
-
Memory management: Each pipeline script runs as a separate R subprocess to prevent OOM on systems with 15 GB RAM. The
fixestlean estimation mode (setFixest_estimation(lean = TRUE)) reduces model storage from ~4 GB to ~2.5 MB. -
Parquet cache: The first run reads the 6.4 GB CSV file and creates a parquet cache (~73 columns). Subsequent runs load directly from parquet (~5 seconds vs. ~5 minutes).
-
Thread configuration: Both
fixestanddata.tableuse 16 threads by default. Adjust inscripts/utils.Rif running on a machine with fewer cores.