Reproducibility on Kailas Venkitasubramanian

Embracing multilingualism in data science

Thu, 10 Apr 2025 00:00:00 +0000

Both of those efforts — reproducibility and pipelines — rest on a more basic question: which programming languages should a small research team actually use? In the previous posts of this series, I covered why reproducibility matters and how we are designing reproducible data pipelines at the UNC Charlotte Urban Institute. This post is about the layer underneath both.

Specifically, I want to argue that embracing multilingualism—fluency in both R and Python, rather than loyalty to one—has quietly done more for our team’s output than almost any other choice we’ve made.

Changing CRDT operations under a Cloud

Fri, 09 Jun 2023 00:00:00 +0000

The promise and peril of a large contract

Much of the previous challenges in managing the technical operations at the data trust stemmed from a lack of understanding of the scope and extent of effort for a given piece of work and having no barometer to measure productivity (or the lack of it). This meant that everyone knew that a given piece of work took 1 month to complete, everyone agreed that this delay was not acceptable,but no one really could pinpoint where the bottlenecks were and why they existed.

Plunging into the Data Trust black box, and Deep Cleaning the System

Sat, 20 May 2023 00:00:00 +0000

Diving into the world of administrative data and CRDT

Administrative data is messy is not much of an adage as much as it is a reality. When I took reins of managing the data infrastructure and analytical operations of Institute for Social Capital or ISC (now called the Charlotte Regional Data Trust) in the middle of 2021, messiness extended beyond data. The dysfunction was deep in how data was collected and organized, the way data operations and analyses were conducted, how information was collected from stakeholders, and how data was disseminated.

Towards reproducible data science for community and policy research - An experiential roadmap

Mon, 06 Mar 2023 00:00:00 +0000

On developing a reproducible data science framework and practice at the Charlotte Urban Insitute

UI Reproducibility Project

Sat, 31 Dec 2022 00:00:00 +0000

Summary

Background

Diverse research backgrounds, skills and operational practices make our institute versatile and nimble to address research problems that crosses several domains. But they also enabled research analytical practices to remain fragmented and inefficient.

The Urban Institute data science team recognized the significance of reproducibility in analytical community research practice on two distinct contexts. 1) Operational efficiency via streamlined use and reuse of data, analytical tools and assets 2) developing a culture of transparency and trust that underpins reproducible research whose products become fully replicable and auditable.

UI Data and Analytics Guide

Wed, 03 Aug 2022 00:00:00 +0000

Summary

Objective(s) and Scope

The project aims to create a comprehensive guide to all operational processes of the the Urban Institute, serving as a primary point of reference for all research staff in managing data and analytical resources of the institute.

The manual will be created using Rmarkdown, a tool that allows for the creation of rich, interactive documents. The manual will be hosted as a website that can be easily updated and maintained by team members.

Charlotte Regional Data Trust - Technical Operations Manual

Sat, 06 Aug 2022 00:00:00 +0000

On how we developed the technical operations manual at the Charlotte Regional Data Trust

Reproducible Research Framework at the Charlotte Urban Institute: Why It Matters Now

Fri, 08 Jul 2022 00:00:00 +0000

In recent years, conversations about reproducibility have moved from academic journals into policy circles, foundations, and government agencies. What was once framed as a “replication crisis” in psychology has broadened into a wider concern about the credibility, transparency, and cumulative nature of scientific work across disciplines (Open Science Collaboration, 2015).

For those of us engaged in quantitative community research—especially in dynamic regional contexts like the Charlotte metropolitan area—reproducibility is not merely a philosophical concern. It is an operational one.

CRDT Anonymization and Privacy Project

Mon, 06 Jun 2022 00:00:00 +0000

Summary

Objective(s) and Scope

The project involves the development and implementation of protocols and best practices in privacy for data dissemination, including statistical disclosure control procedures of the integrated data system hosted by CRDT. This includes creating guidelines for data collection, storage, and dissemination, as well as implementing robust technical measures to prevent unauthorized access and disclosure of sensitive information.

The project will start with a comprehensive review of current privacy practices and identification of areas that need improvement. Based on this review, a set of protocols and best practices will be developed and incorporated into the organization’s data management processes. This will include the implementation of statistical disclosure control procedures to protect sensitive information. Training sessions will be conducted to educate employees on the new privacy protocols and best practices.

CRDT Technical Operations Manual

Mon, 04 Apr 2022 00:00:00 +0000

Summary

Objective(s) and Scope

The goal of this project is to create a comprehensive technical operations manual that documents all operational processes of the Charlotte Regional Data Trust The manual will serve as a single point of reference and provide clear, up-to-date information on all operational procedures.

CRDT Data Documentation Project

Thu, 03 Mar 2022 00:00:00 +0000

Summary

Objective(s) and Scope

The project seeks to enhance the quality and completeness of CRDT’s data documentation, and establish a centralized and organized infrastructure for storing and managing metadata.

The project includes reviewing the existing metadata, developing standardized data documentation (metadata, codebook, data dictionary), and implementing a data infrastructure for storing and organizing metadata for all databases.

Expected Outcomes

Improved data documentation quality and completeness.
Consistent and standardized data documentation across all databases.
A centralized and organized infrastructure for storing and accessing metadata.