Causal Inference From Observational Data

  • Consider a treatment \(D\) and outcome \(Y\)

  • Interested in the population average treatment effect (PATE) of \(D\) on \(D\): \[E[Y | do(D=d)] - E[Y | do(D=d')]\]

  • In general, the PATE is not the same as \[E[Y | D=d] - E[Y | D=d']\]

Confounders

Need to control for \(U\) to consistently estimate the causal effect

Confounding bias

  • Observed data regression of \(D\) on \(Y\) fails because the distribution of \(U\) varies in the two treatment arms

  • We try to condition on as many observed confounders as possible to mitigate potential confounding bias

  • Commonly assumed that there are “no unobserved confounders” (NUC) but this is unverifiable

Unmeasured Confounding

  • When there are unmeasured confounders, additional assumptions are needed to identify causal effects

  • Sensitivity analysis: how strong would unmeasured confounding have to be to explain away the observed association? Cinelli and Hazlett (2020)

  • Null controls: use negative control exposures or outcomes to detect and adjust for unmeasured confounding (Shi, Miao, and Tchetgen 2020)

A Simple Example

A Simple Example

  • Observational data from the National Health and Nutrition Examination Study (NHANES) on alcohol consumption.

  • Light alcohol consumption is positively correlated with blood levels of HDL (“good cholesterol”)

  • Define “light alcohol consumption’’ as 1-2 alcoholic beverages per day

  • Non-drinkers: self-reported drinking of one drink a week or less

  • Control for age, gender and indicator for educational attainment

HDL and alcohol consumption


What must be true for this correlation to be non-causal?

Blood mercury and alcohol consumption

summary(lm(Y[, "Methylmercury"] ~ drinking + X))

Call:
lm(formula = Y[, "Methylmercury"] ~ drinking + X)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.3570 -0.7363 -0.0728  0.6242  4.1127 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.442044   0.096385   4.586 4.91e-06 ***
drinking     0.364096   0.097244   3.744 0.000188 ***
Xage         0.008186   0.001536   5.330 1.14e-07 ***
Xgender     -0.062664   0.052290  -1.198 0.230966    
Xeduc        0.269815   0.054126   4.985 6.95e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.975 on 1434 degrees of freedom
Multiple R-squared:  0.05209,   Adjusted R-squared:  0.04945 
F-statistic:  19.7 on 4 and 1434 DF,  p-value: 8.41e-16

. . .


But… no plausible causal mechanism in this case

Residual Correlation

hdl_fit <- lm(Y[, "HDL"] ~ drinking + X)
mercury_fit <- lm(Y[, "Methylmercury"] ~ drinking + X)

cor.test(hdl_fit$residuals, mercury_fit$residuals)

    Pearson's product-moment correlation

data:  hdl_fit$residuals and mercury_fit$residuals
t = 3.7569, df = 1437, p-value = 0.0001789
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.04718758 0.14953581
sample estimates:
      cor 
0.0986225 


Residual correlation might be indicative of confounding bias

Multivariate Causal Inference and Unmeasured Confounding

The effect of Pollution on Birth Weight

How does pre-natal exposure to PM\(_{2.5}\) affect birth weight?

PM\(_{2.5}\) and Birth Weight

The effect of Polution on Birth Weight


Call:
lm(formula = bw_mean ~ pm25, data = mutate(pm25_data, pm25 = pm25/10))

Residuals:
    Min      1Q  Median      3Q     Max 
-501.56  -38.76   -1.35   37.71  330.60 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 3337.181      2.784 1198.52   <2e-16 ***
pm25         -47.688      2.857  -16.69   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 64.38 on 8461 degrees of freedom
Multiple R-squared:  0.03187,   Adjusted R-squared:  0.03175 
F-statistic: 278.5 on 1 and 8461 DF,  p-value: < 2.2e-16

A 10 μg/m\(^3\) increase in PM\(_{2.5}\) is associated with a 48 g decrease in birth weight

The effect of Polution on Birth Weight

  • Outcome: 2018–2024 ZIP code–level birth counts by weight category (California Vital Data, Cal-ViDa Query Tool)

  • Exposure: Annual ZIP code–level PM\(_{2.5}\) concentrations, derived from high-resolution estimates (Shen et al. 2024)

  • Observed covariates: Maternal age, race, education, nativity, prenatal care timing, household income, geographic coordinates, and calendar year

  • Potential Unmeasured confounders: neighborhood deprivation, other socioeconmic factors, cultural and lifestyle factors

The effect of Polution on Birth Weight

  • Meta-analyses: Birth weight decreases by 16–28 g per 10 μg/m\(^3\) increase in PM\(_{2.5}\) exposure during pregnancy (Gong et al. 2022).

  • High between-study heterogeneity (range −79.3 g to +24.9 g)

  • None handles unmeasured confounders (not measurable by space)

  • Motivation: challenges from confounding and measurement variability call for robust causal inference

Spatial Confounding

  • Spatial confounding: unmeasured confounders vary over space and are correlated with exposure
  • Common to assume unmeasured confounders is a measurable function of space (Gilbert, Datta, and Ogburn 2021)
    • Spatial location then can serve as a proxy for unmeasured confounders
    • Nonspatial component in the exposure required
  • Use Double Machine Learning (DML) to adjust for spatial location and other observed covariates (Chernozhukov et al. 2018)

Double Machine Learning

  • Partially linear model: \(Y = \beta D + g(X) + \epsilon\).
  • Estimate \(\mu(X) = E[Y | X]\) and \(e(X) = E[Z | X]\) using nonparametric / ML methods
  • Let \(\tilde Y = Y - \hat \mu(X)\) and \(\tilde Z = Z - \hat e(X)\) and estimate causal effect by regressing \(\tilde Y\) ~ \(\tilde Z\) to get \(\hat \beta\).
This approach is doubly robust

Cross-fitting

We need to avoid overfitting / double-dipping using a technique analogous to cross-validation.

  • Divide the data into \(K\) chunks.
  • For each chunk:
    • Train the outcome and treatment models on the other K-1 folds and predict \(\hat \mu(X)\) and \(\hat e(X)\) on the \(kth\) held out fold.
    • Regress \(\tilde Y^{(k)} \sim \tilde Z^{(k)}\) to estimate \(\hat \beta^{(k)}\)
  • Estimate \(\hat \beta\) as \(\frac{1}{K} \sum \hat \beta^{(k)}\)

Why DML alone may not be enough

  • In practice, confounders vary over space with idiosyncratic differences, mixing spatial and non-spatial variations

  • Interference: outcomes may depend on exposures at neighboring locations and past time points

  • DML does not explicitly address causal identifiability or omitted variable bias

We will further leverage variation over time and space to help identify causal effects in the presence of unmeasured confounding

A panel data approach

  • Setup: panel with \(N\) locations over \(T\) time points; exposure \(D_{it}\) and outcome \(Y_{it}\)

  • Latent variable modeling: residual correlations in space and time reflect unmeasured confounders

    • capture long-range/global correlations
    • borrow strength from sparse or irregular data
    • robust to nonstationarity

A panel data approach

  • Negative controls: exposures or outcomes from other locations/times serve as negative controls.
    • Allow for some for some spillover / interference
  • Model-agnostic: fit any spatiotemporal model for observed data and apply our method to the residuals

Spatial Causal Inference

  • Panel at \(N\) locations over \(T\) time points: exposure \(D_{it}\), outcome \(Y_{it}\)

  • Population average treatment effect (PATE) of \(D\) on \(Y\): \[\mathbb{E}\left[ Y_{it}\bigl(\mathbf d^{(1)}_{\mathcal N_{it}}\bigr) -Y_{it}\bigl(\mathbf d^{(2)}_{\mathcal N_{it}}\bigr)\right]\]

  • In general, PATE is not the same as \[\mathbb{E}\left[Y_{it}\mid \mathbf D_{\mathcal N_{it}} =\mathbf d^{(1)}_{\mathcal N_{it}}\right] - \mathbb{E}\left[Y_{it}\mid\mathbf D_{\mathcal N_{it}} =\mathbf d^{(2)}_{\mathcal N_{it}}\right]\]

Assumptions

Assumption (Limited interference)

For every \((i,t)\) and exposure \(\mathbf d\), \(Y_{it}(\mathbf d)=Y_{it}(\mathbf d_{\mathcal N_{it}})\), where \(\mathbf d_{\mathcal N_{it}} :=\{d_{jk}:(j,k)\in\mathcal N_{it}\}\).

Assumption (Latent positivity)

\(f_{\mathbf D_{\mathcal N_{it}}\mid \mathbf X_{it},\mathbf S_i,\mathbf U_{it}} (\mathbf d\mid \mathbf x, \mathbf s, \mathbf u) > 0\) for every \((\mathbf d, \mathbf x, \mathbf s, \mathbf u)\) in the support.

Assumption (Latent unconfoundedness)

\(Y_{it}(\mathbf d_{\mathcal N_{it}}) \perp \!\!\! \perp\mathbf D_{\mathcal N_{it}} \mid (\mathbf X_{it},\,\mathbf S_i,\,\mathbf U_{it})\) for all \(\mathbf d_{\mathcal N_{it}}\).

Structural Equation Model

Assume the exposures and outcomes are linear in a latent m-dimensional Gaussian confounder:

\[\begin{align} \mathbf U_{t} &\sim \mathcal{N}_M(0, I_M)\\ \mathbf{D}_t &= \nu(X) + B \mathbf U_{t} + \mathbf{\xi}_t \\ \mathbf{Y}_t &= g(\mathbf{D}, \mathbf{X}) + \Gamma\Sigma_{U\mid D}^{-1/2} \mathbf U_{t} + \mathbf{\epsilon}_t \end{align}\] where \(\Sigma_{U\mid D}\) are the conditional mean and covariance of unmeasured confounders

Bias is \(\Gamma\Sigma_{U\mid D}^{-1/2}E[U \mid D]\)

When is the bias (partially) identifiable?

Structural Equation Model

The proposed model implies a factor structure: \[\begin{aligned} \operatorname{Cov}(\mathbf D_t \mid \mathbf{X}) &= BB^{\top} + \Lambda_D\\ \operatorname{Cov}(\mathbf Y_t \mid \mathbf D_t) &= \Gamma\Gamma^{\top} + \Lambda_Y \end{aligned} \]

\(\Gamma\) and \(B\) are the outcome and exposure factor loadings, respectively. Identying assumptions are well established (anderson1965statistical?).

Bounding the Bias

Proposition

Under the proposed model and assumptions on factor identifiability, the causal effect \(g(\cdot)\) is partially identified. Let \(\check \gamma_i\) be the \(i\)th row of \(\check \Gamma\). For site \(i\), the omitted variable bias for exposure vector \(\mathbf d\) is
\[ \text{Bias}(\mathbf d)_i = \check \gamma_i \Theta \check \Sigma_{U \mid D}^{-1/2} \check B^{\top} \Sigma_{D}^{-1} \mathbf d \in \pm \|\check \gamma_i\|_2 \, \bigl\|\check \Sigma_{U \mid D}^{-1/2}\check B^{\top}\Sigma_{D}^{-1}\mathbf d\bigr\|_2. \] - \(\Theta \in \mathcal O_M\) is an orthogonal matrix.
- The interval on the right is identifiable for all \(i\).
- \(\Theta\), and hence \(g(\cdot)\), remain unidentified without further assumptions.

Limits on Interference

Assumption (Off-Neighborhood Rank — informal)

  • Let \(\mathcal N_i\) be the interference neighborhood of unit \(i\) (units \(j\) such that \(\partial g_i(\mathbf D_t)/\partial D_{jt}\neq 0\)).
  • There exist \(M\) “informative” units \(i_1,\dots,i_M\) such that:
    1. Their outcome loadings \(\{\gamma_{i_\ell}\}_{\ell=1}^M\) are linearly independent (span \(\mathbb R^M\));
    2. Considering only indices outside each neighborhood (\(j\notin\mathcal N_{i_\ell}\)), the exposure–confounder directions have full row rank \(M\).

In practice, \(M \ll N\) and neighborhoods are small, so this condition is mild and typically satisfied.

Identification Result

Theorem

Under the structural model and identification assumptions, the causal effect functions \(g_i\bigl((D_{jt})_{j \in \mathcal N_i}\bigr)\) are identified for all units \(i\).

Intuition:

  • \(N \times T\) bias matrix \(C = \Gamma \Sigma_{U \mid D}^{-1/2} B^\top \Sigma_D^{-1}\) is rank \(m\)

  • For \(j \notin \mathcal N_i\), any association between \(Y_{it}\) and \(D_{jt}\) reflects unmeasured confounding and identifies \(C_{ij}\).

  • The rank condition ensures enough entries of \(C\) are known to recover the whole matrix.

Interactive Fixed Effects Model

  • Closely related approach from econometrics (Bai 2009)

  • Treats all parameters as fixed effects, with identification in an asymptotic framework (N, P → ∞)

  • Do not explicitly address causal identifiability or omitted variable bias

  • Assume outcomes are linear in the exposures

Simulation Study

Estimators compared

  • DML (NUC): Adjusts for observed covariates, treating unmeasured confounding as a smooth function of space and time (Chernozhukov et al. 2018)

  • IFE: Interactive fixed effects estimator (Bai 2009)

  • FC (Proposed): Factor confounding approach, explicitly modeling latent confounders

Simulation: Comparing Estimators

Simulation: Spatial Interference

Effect of PM\(_{2.5}\) on Birth Weight

  • Meta-analyses: Birth weight decreases by 16–28 g per 10 μg/m\(^3\) increase in PM\(_{2.5}\) exposure during pregnancy.

  • Outcome: 2018–2024 ZIP code–level birth counts by weight category (California Vital Data, Cal-ViDa Query Tool)

  • Exposure: Annual ZIP code–level PM\(_{2.5}\) concentrations, derived from high-resolution estimates (Shen et al. 2024)

  • Observed confounders: Maternal age, race, education, nativity, prenatal care timing, household income, geographic coordinates, and calendar year

Effect of PM\(_{2.5}\) on Birth Weight

Validation in Causal Settings

  • In general, we do not know the true causal effect in observational data

  • Predictive accuracy is not a good substitute for causal accuracy

  • Negative control / placebo checks: PM\(_{2.5}\) in the year after birth should not affect birth weight (structural assumption in our model)

  • Robustness to covariate adjustment: if our method handles unmeasured confounding, estimates should not change much when adding or removing observed covariates

Robustness to Covariate Adjustment

Results

Takeaways

  • Double machine learning with spatial location as a proxy for unmeasured confounders may not fully account for confounding bias

  • Latent variable models can help account for unmeasured confounding

  • For the proposed model, with mild rank and partial interference assumptions, causal effects are identifiable and unbiased estimates can be achieved from spatiotemporal data

Future Directions

  • More general forms of confounding, non-linear latent variable models, etc

  • Analytic results much more complicated in non-linear latent variable models

  • Need to identify E[U | D, X]

  • Causal inference with tensor data (multiple outcomes / exposures across space and time)

  • Mixtures of multiple polutants, multiple health outcomes, etc

Acknowledgements

  • Lead author, Jiaxi Wu (Former Phd student, now at Amazon)

References

Bai, Jushan. 2009. “Panel Data Models with Interactive Fixed Effects.” Econometrica 77 (4): 1229–79.
Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. 2018. “Double/Debiased Machine Learning for Treatment and Structural Parameters.” Oxford University Press Oxford, UK.
Cinelli, Carlos, and Chad Hazlett. 2020. “Making Sense of Sensitivity: Extending Omitted Variable Bias.” Journal of the Royal Statistical Society Series B: Statistical Methodology 82 (1): 39–67.
Gilbert, Brian, Abhirup Datta, and Elizabeth Ogburn. 2021. “A Causal Inference Framework for Spatial Confounding.” arXiv Preprint arXiv:2112.14946.
Gong, Chen, Jianmei Wang, Zhipeng Bai, David Q Rich, and Yujuan Zhang. 2022. “Maternal Exposure to Ambient PM2. 5 and Term Birth Weight: A Systematic Review and Meta-Analysis of Effect Estimates.” Science of The Total Environment 807: 150744.
Shen, Siyuan, Chi Li, Aaron Van Donkelaar, Nathan Jacobs, Chenguang Wang, and Randall V Martin. 2024. “Enhancing Global Estimation of Fine Particulate Matter Concentrations by Including Geophysical a Priori Information in Deep Learning.” ACS ES&T Air 1 (5): 332–45.
Shi, Xu, Wang Miao, and Eric Tchetgen Tchetgen. 2020. “A Selective Review of Negative Control Methods in Epidemiology.” Current Epidemiology Reports 7 (4): 190–202.

Thank You!