Open Science on Kailas Venkitasubramanian

Embracing multilingualism in data science

Thu, 10 Apr 2025 00:00:00 +0000

Both of those efforts — reproducibility and pipelines — rest on a more basic question: which programming languages should a small research team actually use? In the previous posts of this series, I covered why reproducibility matters and how we are designing reproducible data pipelines at the UNC Charlotte Urban Institute. This post is about the layer underneath both.

Specifically, I want to argue that embracing multilingualism—fluency in both R and Python, rather than loyalty to one—has quietly done more for our team’s output than almost any other choice we’ve made.

Changing CRDT operations under a Cloud

Fri, 09 Jun 2023 00:00:00 +0000

The promise and peril of a large contract

Much of the previous challenges in managing the technical operations at the data trust stemmed from a lack of understanding of the scope and extent of effort for a given piece of work and having no barometer to measure productivity (or the lack of it). This meant that everyone knew that a given piece of work took 1 month to complete, everyone agreed that this delay was not acceptable,but no one really could pinpoint where the bottlenecks were and why they existed.

Plunging into the Data Trust black box, and Deep Cleaning the System

Sat, 20 May 2023 00:00:00 +0000

Diving into the world of administrative data and CRDT

Administrative data is messy is not much of an adage as much as it is a reality. When I took reins of managing the data infrastructure and analytical operations of Institute for Social Capital or ISC (now called the Charlotte Regional Data Trust) in the middle of 2021, messiness extended beyond data. The dysfunction was deep in how data was collected and organized, the way data operations and analyses were conducted, how information was collected from stakeholders, and how data was disseminated.

UI Reproducibility Project

Sat, 31 Dec 2022 00:00:00 +0000

Summary

Background

Diverse research backgrounds, skills and operational practices make our institute versatile and nimble to address research problems that crosses several domains. But they also enabled research analytical practices to remain fragmented and inefficient.

The Urban Institute data science team recognized the significance of reproducibility in analytical community research practice on two distinct contexts. 1) Operational efficiency via streamlined use and reuse of data, analytical tools and assets 2) developing a culture of transparency and trust that underpins reproducible research whose products become fully replicable and auditable.

UI Data and Analytics Guide

Wed, 03 Aug 2022 00:00:00 +0000

Summary

Objective(s) and Scope

The project aims to create a comprehensive guide to all operational processes of the the Urban Institute, serving as a primary point of reference for all research staff in managing data and analytical resources of the institute.

The manual will be created using Rmarkdown, a tool that allows for the creation of rich, interactive documents. The manual will be hosted as a website that can be easily updated and maintained by team members.