Designing Reproducible Data Pipelines for Community Research

Sat, 09 Mar 2024 00:00:00 +0000

In the first post of this series, I argued that reproducibility is not a technical luxury for community research institutions—it is an ethical and operational obligation. In this post, I want to move from philosophy to plumbing—because this is where reproducibility becomes real.

Specifically: what does it mean to design reproducible data pipelines in a community research environment?

At the UNC Charlotte Urban Institute, this question became concrete as we built the Quality of Life Explorer, developed deposit and extraction pipelines for the Charlotte Regional Data Trust, and began orchestrating workflows using Apache Airflow in an AWS environment.

Data Engineering on Kailas Venkitasubramanian

Designing Reproducible Data Pipelines for Community Research