<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Reproducibility on Kailas Venkitasubramanian</title>
    <link>/tags/reproducibility/</link>
    <description>Recent content in Reproducibility on Kailas Venkitasubramanian</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Thu, 10 Apr 2025 00:00:00 +0000</lastBuildDate>
    <atom:link href="/tags/reproducibility/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Embracing multilingualism in data science</title>
      <link>/blog/series/reproducible-research-series/2025-04-10-multilingualism-in-data-science/</link>
      <pubDate>Thu, 10 Apr 2025 00:00:00 +0000</pubDate>
      <guid>/blog/series/reproducible-research-series/2025-04-10-multilingualism-in-data-science/</guid>
      <description>&lt;p&gt;Both of those efforts — reproducibility and pipelines — rest on a more basic question: which programming languages should a small research team actually use? In the previous posts of this series, I covered &#xA;&lt;a href=&#34;/blog/series/reproducible-research-series/2022-07-08-building-blocks-of-a-reproducible-research-framework/&#34;&gt;why reproducibility matters&lt;/a&gt; and how we are &#xA;&lt;a href=&#34;/blog/series/reproducible-research-series/2022-04-10-designing-reproducible-data-pipelines/&#34;&gt;designing reproducible data pipelines&lt;/a&gt; at the UNC Charlotte Urban Institute. This post is about the layer underneath both.&lt;/p&gt;&#xA;&lt;p&gt;Specifically, I want to argue that embracing &lt;em&gt;multilingualism&lt;/em&gt;&amp;mdash;fluency in both R and Python, rather than loyalty to one&amp;mdash;has quietly done more for our team&amp;rsquo;s output than almost any other choice we&amp;rsquo;ve made.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Changing CRDT operations under a Cloud</title>
      <link>/blog/series/crdt-telenovela-series/2023-06-09-making-sense-of-data-and-documenting-it/</link>
      <pubDate>Fri, 09 Jun 2023 00:00:00 +0000</pubDate>
      <guid>/blog/series/crdt-telenovela-series/2023-06-09-making-sense-of-data-and-documenting-it/</guid>
      <description>&lt;h2 id=&#34;the-promise-and-peril-of-a-large-contract&#34;&gt;The promise and peril of a large contract&#xA;  &lt;a href=&#34;#the-promise-and-peril-of-a-large-contract&#34;&gt;&lt;svg class=&#34;anchor-symbol&#34; aria-hidden=&#34;true&#34; height=&#34;26&#34; width=&#34;26&#34; viewBox=&#34;0 0 22 22&#34; xmlns=&#34;http://www.w3.org/2000/svg&#34;&gt;&#xA;      &lt;path d=&#34;M0 0h24v24H0z&#34; fill=&#34;currentColor&#34;&gt;&lt;/path&gt;&#xA;      &lt;path d=&#34;M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z&#34;&gt;&lt;/path&gt;&#xA;    &lt;/svg&gt;&lt;/a&gt;&#xA;&lt;/h2&gt;&#xA;&lt;p&gt;Much of the previous challenges in managing the technical operations at the data trust stemmed from a lack of understanding of the scope and extent of effort for a given piece of work and having no barometer to measure productivity (or the lack of it). This meant that everyone knew that a given piece of work took 1 month to complete, everyone agreed that this delay was not acceptable,but no one really could pinpoint where the bottlenecks were and why they existed.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Plunging into the Data Trust black box, and Deep Cleaning the System</title>
      <link>/blog/series/crdt-telenovela-series/2023-05-20-plunging-into-data-trust/</link>
      <pubDate>Sat, 20 May 2023 00:00:00 +0000</pubDate>
      <guid>/blog/series/crdt-telenovela-series/2023-05-20-plunging-into-data-trust/</guid>
      <description>&lt;h2 id=&#34;diving-into-the-world-of-administrative-data-and-crdt&#34;&gt;Diving into the world of administrative data and CRDT&#xA;  &lt;a href=&#34;#diving-into-the-world-of-administrative-data-and-crdt&#34;&gt;&lt;svg class=&#34;anchor-symbol&#34; aria-hidden=&#34;true&#34; height=&#34;26&#34; width=&#34;26&#34; viewBox=&#34;0 0 22 22&#34; xmlns=&#34;http://www.w3.org/2000/svg&#34;&gt;&#xA;      &lt;path d=&#34;M0 0h24v24H0z&#34; fill=&#34;currentColor&#34;&gt;&lt;/path&gt;&#xA;      &lt;path d=&#34;M3.9 12c0-1.71 1.39-3.1 3.1-3.1h4V7H7c-2.76.0-5 2.24-5 5s2.24 5 5 5h4v-1.9H7c-1.71.0-3.1-1.39-3.1-3.1zM8 13h8v-2H8v2zm9-6h-4v1.9h4c1.71.0 3.1 1.39 3.1 3.1s-1.39 3.1-3.1 3.1h-4V17h4c2.76.0 5-2.24 5-5s-2.24-5-5-5z&#34;&gt;&lt;/path&gt;&#xA;    &lt;/svg&gt;&lt;/a&gt;&#xA;&lt;/h2&gt;&#xA;&lt;p&gt;Administrative data is messy is not much of an adage as much as it is a reality. When I took reins of managing the data infrastructure and analytical operations of Institute for Social Capital or ISC (now called the Charlotte Regional Data Trust) in the middle of 2021, messiness extended beyond data. The dysfunction was deep in how data was collected and organized, the way data operations and analyses were conducted, how information was collected from stakeholders, and how data was disseminated.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Towards reproducible data science for community and policy research - An experiential roadmap</title>
      <link>/talk/towards-reproducible-data-science/</link>
      <pubDate>Mon, 06 Mar 2023 00:00:00 +0000</pubDate>
      <guid>/talk/towards-reproducible-data-science/</guid>
      <description>On developing a reproducible data science framework and practice at the Charlotte Urban Insitute</description>
    </item>
    <item>
      <title>UI Reproducibility Project</title>
      <link>/project/ui-reproducibility-project/</link>
      <pubDate>Sat, 31 Dec 2022 00:00:00 +0000</pubDate>
      <guid>/project/ui-reproducibility-project/</guid>
      <description>&lt;div id=&#34;&#34; class=&#34;panelset&#34;&gt;&#xA;  &#xD;&#xA;&lt;div class=&#34;panel&#34;&gt;&#xA;  &lt;div class=&#34;panel-name&#34;&gt;Summary&lt;/div&gt;&#xA;  &#xA;  &lt;p&gt;&#xA;&#xA;&#xA;&#xA;&lt;h5 id=&#34;background&#34;&gt;Background&#xA;  &lt;a href=&#34;#background&#34;&gt;&lt;/a&gt;&#xA;&lt;/h5&gt;&#xA;&lt;p&gt;Diverse research backgrounds, skills and operational practices make our institute versatile and nimble to address research problems that crosses several domains. But they also enabled research analytical practices to remain fragmented and inefficient.&lt;/p&gt;&#xA;&lt;p&gt;The Urban Institute data science team recognized the significance of reproducibility in analytical community research practice on two distinct contexts. 1) Operational efficiency via streamlined use and reuse of data, analytical tools and assets 2) developing a culture of transparency and trust that underpins reproducible research whose products become fully replicable and auditable.&lt;/p&gt;</description>
    </item>
    <item>
      <title>UI Data and Analytics Guide</title>
      <link>/project/ui-data-analytics-guide/</link>
      <pubDate>Wed, 03 Aug 2022 00:00:00 +0000</pubDate>
      <guid>/project/ui-data-analytics-guide/</guid>
      <description>&lt;div id=&#34;&#34; class=&#34;panelset&#34;&gt;&#xA;  &#xD;&#xA;&lt;div class=&#34;panel&#34;&gt;&#xA;  &lt;div class=&#34;panel-name&#34;&gt;Summary&lt;/div&gt;&#xA;  &#xA;  &lt;p&gt;&#xA;&#xA;&#xA;&#xA;&lt;h5 id=&#34;objectives-and-scope&#34;&gt;Objective(s) and Scope&#xA;  &lt;a href=&#34;#objectives-and-scope&#34;&gt;&lt;/a&gt;&#xA;&lt;/h5&gt;&#xA;&lt;p&gt;The project aims to create a comprehensive guide to all operational processes of the the Urban Institute, serving as a primary point of reference for all research staff in managing data and analytical resources of the institute.&lt;/p&gt;&#xA;&lt;p&gt;The manual will be created using Rmarkdown, a tool that allows for the creation of rich, interactive documents. The manual will be hosted as a website that can be easily updated and maintained by team members.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Charlotte Regional Data Trust - Technical Operations Manual</title>
      <link>/talk/tech-operations-manual/</link>
      <pubDate>Sat, 06 Aug 2022 00:00:00 +0000</pubDate>
      <guid>/talk/tech-operations-manual/</guid>
      <description>On how we developed the technical operations manual at the Charlotte Regional Data Trust</description>
    </item>
    <item>
      <title>Reproducible Research Framework at the Charlotte Urban Institute: Why It Matters Now</title>
      <link>/blog/series/reproducible-research-series/2022-07-08-building-blocks-of-a-reproducible-research-framework/</link>
      <pubDate>Fri, 08 Jul 2022 00:00:00 +0000</pubDate>
      <guid>/blog/series/reproducible-research-series/2022-07-08-building-blocks-of-a-reproducible-research-framework/</guid>
      <description>&lt;p&gt;In recent years, conversations about reproducibility have moved from academic journals into policy circles, foundations, and government agencies. What was once framed as a “replication crisis” in psychology has broadened into a wider concern about the credibility, transparency, and cumulative nature of scientific work across disciplines (Open Science Collaboration, 2015).&lt;/p&gt;&#xA;&lt;p&gt;For those of us engaged in quantitative community research—especially in dynamic regional contexts like the Charlotte metropolitan area—reproducibility is not merely a philosophical concern. It is an operational one.&lt;/p&gt;</description>
    </item>
    <item>
      <title>CRDT Anonymization and Privacy Project</title>
      <link>/project/crdt-anonymization-and-privacy-project/</link>
      <pubDate>Mon, 06 Jun 2022 00:00:00 +0000</pubDate>
      <guid>/project/crdt-anonymization-and-privacy-project/</guid>
      <description>&lt;div id=&#34;&#34; class=&#34;panelset&#34;&gt;&#xA;  &#xD;&#xA;&lt;div class=&#34;panel&#34;&gt;&#xA;  &lt;div class=&#34;panel-name&#34;&gt;Summary&lt;/div&gt;&#xA;  &#xA;  &lt;p&gt;&#xA;&#xA;&#xA;&#xA;&lt;h5 id=&#34;objectives-and-scope&#34;&gt;Objective(s) and Scope&#xA;  &lt;a href=&#34;#objectives-and-scope&#34;&gt;&lt;/a&gt;&#xA;&lt;/h5&gt;&#xA;&lt;p&gt;The project involves the development and implementation of protocols and best practices in privacy for data dissemination, including statistical disclosure control procedures of the integrated data system hosted by CRDT. This includes creating guidelines for data collection, storage, and dissemination, as well as implementing robust technical measures to prevent unauthorized access and disclosure of sensitive information.&lt;/p&gt;&#xA;&lt;p&gt;The project will start with a comprehensive review of current privacy practices and identification of areas that need improvement. Based on this review, a set of protocols and best practices will be developed and incorporated into the organization&amp;rsquo;s data management processes. This will include the implementation of statistical disclosure control procedures to protect sensitive information. Training sessions will be conducted to educate employees on the new privacy protocols and best practices.&lt;/p&gt;</description>
    </item>
    <item>
      <title>CRDT Technical Operations Manual</title>
      <link>/project/crdt-technical-operations-manual/</link>
      <pubDate>Mon, 04 Apr 2022 00:00:00 +0000</pubDate>
      <guid>/project/crdt-technical-operations-manual/</guid>
      <description>&lt;div id=&#34;&#34; class=&#34;panelset&#34;&gt;&#xA;  &#xD;&#xA;&lt;div class=&#34;panel&#34;&gt;&#xA;  &lt;div class=&#34;panel-name&#34;&gt;Summary&lt;/div&gt;&#xA;  &#xA;  &lt;p&gt;&#xA;&#xA;&#xA;&#xA;&lt;h5 id=&#34;objectives-and-scope&#34;&gt;Objective(s) and Scope&#xA;  &lt;a href=&#34;#objectives-and-scope&#34;&gt;&lt;/a&gt;&#xA;&lt;/h5&gt;&#xA;&lt;p&gt;The goal of this project is to create a comprehensive technical operations manual that documents all operational processes of the Charlotte Regional Data Trust The manual will serve as a single point of reference and provide clear, up-to-date information on all operational procedures.&lt;/p&gt;&#xA;&lt;p&gt;The manual will be created using Rmarkdown, a tool that allows for the creation of rich, interactive documents. The manual will be hosted as a website that can be easily updated and maintained by team members.&lt;/p&gt;</description>
    </item>
    <item>
      <title>CRDT Data Documentation Project</title>
      <link>/project/crdt-data-documentation-project/</link>
      <pubDate>Thu, 03 Mar 2022 00:00:00 +0000</pubDate>
      <guid>/project/crdt-data-documentation-project/</guid>
      <description>&lt;div id=&#34;&#34; class=&#34;panelset&#34;&gt;&#xA;  &#xD;&#xA;&lt;div class=&#34;panel&#34;&gt;&#xA;  &lt;div class=&#34;panel-name&#34;&gt;Summary&lt;/div&gt;&#xA;  &#xA;  &lt;p&gt;&#xA;&#xA;&#xA;&#xA;&lt;h5 id=&#34;objectives-and-scope&#34;&gt;Objective(s) and Scope&#xA;  &lt;a href=&#34;#objectives-and-scope&#34;&gt;&lt;/a&gt;&#xA;&lt;/h5&gt;&#xA;&lt;p&gt;The project seeks to enhance the quality and completeness of CRDT&amp;rsquo;s data documentation, and establish a centralized and organized infrastructure for storing and managing metadata.&lt;/p&gt;&#xA;&lt;p&gt;The project includes reviewing the existing metadata, developing standardized data documentation (metadata, codebook, data dictionary), and implementing a data infrastructure for storing and organizing metadata for all databases.&lt;/p&gt;&#xA;&#xA;&#xA;&#xA;&#xA;&lt;h5 id=&#34;expected-outcomes&#34;&gt;Expected Outcomes&#xA;  &lt;a href=&#34;#expected-outcomes&#34;&gt;&lt;/a&gt;&#xA;&lt;/h5&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Improved data documentation quality and completeness.&lt;/li&gt;&#xA;&lt;li&gt;Consistent and standardized data documentation across all databases.&lt;/li&gt;&#xA;&lt;li&gt;A centralized and organized infrastructure for storing and accessing metadata.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
