<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Data Engineering on Kailas Venkitasubramanian</title>
    <link>/categories/data-engineering/</link>
    <description>Recent content in Data Engineering on Kailas Venkitasubramanian</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Sat, 09 Mar 2024 00:00:00 +0000</lastBuildDate>
    <atom:link href="/categories/data-engineering/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Designing Reproducible Data Pipelines for Community Research</title>
      <link>/blog/series/reproducible-research-series/2022-04-10-designing-reproducible-data-pipelines/</link>
      <pubDate>Sat, 09 Mar 2024 00:00:00 +0000</pubDate>
      <guid>/blog/series/reproducible-research-series/2022-04-10-designing-reproducible-data-pipelines/</guid>
      <description>&lt;p&gt;In the first post of this series, I argued that reproducibility is not a technical luxury for community research institutions—it is an ethical and operational obligation. In this post, I want to move from philosophy to plumbing—because this is where reproducibility becomes real.&lt;/p&gt;&#xA;&lt;p&gt;Specifically: what does it mean to design &lt;em&gt;reproducible data pipelines&lt;/em&gt; in a community research environment?&lt;/p&gt;&#xA;&lt;p&gt;At the UNC Charlotte Urban Institute, this question became concrete as we built the &lt;strong&gt;Quality of Life Explorer&lt;/strong&gt;, developed deposit and extraction pipelines for the &lt;strong&gt;Charlotte Regional Data Trust&lt;/strong&gt;, and began orchestrating workflows using Apache Airflow in an AWS environment.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
