Prof. Arnab Bhattacharyya
School of Computing, National University of Singapore, Singapore
Topic: Learning and Inference from Confounded and Biased Data
Abstract:
I will give an overview of current algorithmic research in the area of causality. The four-part sequence of talks will cover: (1) Introduction to causal inference, and formulation of computational problems; (2) Algorithms and complexity of learning high-dimensional graphical models; (3) Algorithms and complexity of inferring causal effects from observational and experimental data; (4) Learning and inference on censored and truncated data.
In the first session, I will introduce the notion of causal models and their distinction from statistical models. I will discuss both the graphical model as well as the potential outcome frameworks for causality, although the focus will mostly be on the former. I will then describe some natural computational problems that arise in these contexts that nonetheless have only been rigorously studied very recently.
In the second session, I will focus on the problem of learning causal models from data, particularly in the graphical model framework. First, I show that if the underlying graph is known, then we can design algorithms with optimal sample complexity. Next, I move to the question of learning the graph. Here, I will highlight known computational hardness as well as the glaring open problems that remain. I will also discuss nearly optimal algorithms for learning tree models.
In the third session, I will describe work on the classical problem of inferring causal (treatment) effects from data. I will start by describing work in the potential outcome framework from the econometric literature, such as double robustness, semiparametric estimators and their asymptotic behavior. I will also present my perspective on them from a computer science point of view. Then, I also talk about inference problems on graphical models and describe algorithmic and complexity results in that context.
In the fourth session, I will consider other aspects of the data generation process beyond confounding. I will describe known results for learning high-dimensional models from adversarially corrupted data. Then, I will consider truncation (where some samples are removed from the dataset) and censoring (where some features in each sample are censored). Although these are classical topics, algorithmic research on them is fairly recent, and I will attempt to give a high-level overview of current work as well as the main open problems.
Bio:
Arnab Bhattacharyya is an assistant professor at the School of Computing, National University of Singapore. He obtained his undergraduate and doctoral degrees in computer science from the Massachusetts Institute of Technology. Subsequently, he was a postdoctoral associate at Princeton University and Rutgers University, and an assistant professor and a Ramanujan Fellow at the Indian Institute of Science, Bangalore. He is the recipient of a Google Research Award, Amazon Faculty Research Award and a National Research Foundation (Singapore) Fellowship in AI. His work has been recognized as “Best of PODS 2021”, ACM SIGMOD Research Highlight 2021, and CACM Research Highlight 2022.
His research area is theoretical computer science and foundations of data science, in a broad sense. Specifically, he is interested in algorithms for problems involving high-dimensional data, causal inference, sublinear time algorithms, complexity theory, and algorithmic models for physical systems.