Skip to content

Replication

v8 banner

This page describes the v1--v4 reduced-form pipeline (scripts/00_master.R). The JPubE submission-ready v8 structural pipeline lives in v7-jpube-tight/scripts/ (canonical BNE simulation 45_bne_simulation.R) and v8-jpube/scripts/ (empirical bidder-count robustness 46_bne_empirical_counts.R). See v8-jpube/manuscript/paper_v8.tex for the canonical paper. Replication materials with code, generated tables, figures, and non-confidential derived outputs are described in the Data availability section of the manuscript; raw BEC administrative records are not publicly redistributable under the research-access agreement.

Requirements

Component Version
R 4.5+
fixest 0.12+
data.table 1.15+
arrow 14+
ggplot2 3.5+
scales 1.3+
grf 2.3+
quantreg 5.98+
gridExtra 2.3+

Data

The primary dataset is administrative data from BEC (Bolsa Eletronica de Compras), the electronic procurement platform for the state of Sao Paulo. The data contains 373 columns covering all standardized goods procurement from January 2016 to December 2019.

Data access

The raw data files are not publicly available due to confidentiality agreements. Researchers interested in replication should contact the authors directly.

Pipeline

The full analysis pipeline runs from a single master script:

# From the project root directory
Rscript scripts/00_master.R

This executes the following scripts in sequence, each as a separate R subprocess:

Script Purpose Duration
01_clean.R CSV to parquet conversion, variable creation ~5 min (first run)
02_analysis.R 24 DiDiR regressions + 4 event studies ~30 sec
05_robustness.R Placebo, alt. clustering, winsorization, permutation ~45 sec
06_extensions.R Real prices, extensive margin, efficiency, heterogeneity ~30 sec
07_advanced.R HonestDiD, Lee bounds, causal forest, quantile DiD, Gelbach ~3 min
03_tables.R 18 LaTeX tables ~5 sec
04_figures.R 15 PDF figures ~10 sec

Output Structure

output/
├── tables/           # 18 .tex files (threeparttable + booktabs)
│   ├── tab_desc_stats.tex
│   ├── tab_prices.tex
│   ├── tab_participants.tex
│   ├── tab_validbids.tex
│   ├── tab_distance.tex
│   ├── tab_placebo.tex
│   ├── tab_altcluster.tex
│   ├── tab_winsorize.tex
│   ├── tab_prices_real.tex
│   ├── tab_extensive.tex
│   ├── tab_efficiency.tex
│   ├── tab_sme_winner.tex
│   ├── tab_heterog_pbu.tex
│   ├── tab_heterog_value.tex
│   ├── tab_lee_bounds.tex
│   ├── tab_cforest.tex
│   ├── tab_quantile_did.tex
│   └── tab_mediation.tex
└── figures/          # 15 .pdf files (grayscale, cairo)
    ├── fig_01_logprices_es.pdf
    ├── fig_02_distance_es.pdf
    ├── fig_03_numfirms_es.pdf
    ├── fig_04_numbids_es.pdf
    ├── fig_05_trends_prices.pdf
    ├── fig_06_trends_firms.pdf
    ├── fig_07_trends_bids.pdf
    ├── fig_08_trends_distance.pdf
    ├── fig_09_permutation.pdf
    ├── fig_10_sme_share.pdf
    ├── fig_11_honestdid.pdf
    ├── fig_12_cforest_varimp.pdf
    ├── fig_13_cforest_gate.pdf
    ├── fig_14_quantile_did.pdf
    └── fig_15_mediation.pdf

Manuscript Compilation

cd manuscript
pdflatex main.tex
bibtex main
pdflatex main.tex
pdflatex main.tex

Technical Notes

  • Memory management: Each pipeline script runs as a separate R subprocess to prevent OOM on systems with 15 GB RAM. The fixest lean estimation mode (setFixest_estimation(lean = TRUE)) reduces model storage from ~4 GB to ~2.5 MB.

  • Parquet cache: The first run reads the 6.4 GB CSV file and creates a parquet cache (~73 columns). Subsequent runs load directly from parquet (~5 seconds vs. ~5 minutes).

  • Thread configuration: Both fixest and data.table use 16 threads by default. Adjust in scripts/utils.R if running on a machine with fewer cores.