Causal Inference From Observational Data

  • Consider a treatment \(D\) and outcome \(Y\)

  • Interested in the population average treatment effect (PATE) of \(D\) on \(D\): \[E[Y | do(D=d)] - E[Y | do(D=d')]\]

  • In general, the PATE is not the same as \[E[Y | D=d] - E[Y | D=d']\]

Confounders

Need to control for \(U\) to consistently estimate the causal effect

Confounding bias

  • Observed data regression of \(D\) on \(Y\) fails because the distribution of \(U\) varies in the two treatment arms

  • We try to condition on as many observed confounders as possible to mitigate potential confounding bias

  • Commonly assumed that there are “no unobserved confounders” (NUC) but this is unverifiable

Unmeasured Confounding

  • When there are unmeasured confounders, additional assumptions are needed to identify causal effects

  • Sensitivity analysis: how strong would unmeasured confounding have to be to explain away the observed association? Cinelli and Hazlett (2020)

  • Null controls: use negative control exposures or outcomes to detect and adjust for unmeasured confounding (Shi, Miao, and Tchetgen 2020)

A Simple Example

A Simple Example

  • Observational data from the National Health and Nutrition Examination Study (NHANES) on alcohol consumption.

  • Light alcohol consumption is positively correlated with blood levels of HDL (“good cholesterol”)

  • Define “light alcohol consumption’’ as 1-2 alcoholic beverages per day

  • Non-drinkers: self-reported drinking of one drink a week or less

  • Control for age, gender and indicator for educational attainment

HDL and alcohol consumption


What must be true for this correlation to be non-causal?

Blood mercury and alcohol consumption

summary(lm(Y[, "Methylmercury"] ~ drinking + X))

Call:
lm(formula = Y[, "Methylmercury"] ~ drinking + X)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.3570 -0.7363 -0.0728  0.6242  4.1127 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.442044   0.096385   4.586 4.91e-06 ***
drinking     0.364096   0.097244   3.744 0.000188 ***
Xage         0.008186   0.001536   5.330 1.14e-07 ***
Xgender     -0.062664   0.052290  -1.198 0.230966    
Xeduc        0.269815   0.054126   4.985 6.95e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.975 on 1434 degrees of freedom
Multiple R-squared:  0.05209,   Adjusted R-squared:  0.04945 
F-statistic:  19.7 on 4 and 1434 DF,  p-value: 8.41e-16

. . .


But… no plausible causal mechanism in this case

Residual Correlation

hdl_fit <- lm(Y[, "HDL"] ~ drinking + X)
mercury_fit <- lm(Y[, "Methylmercury"] ~ drinking + X)

cor.test(hdl_fit$residuals, mercury_fit$residuals)

    Pearson's product-moment correlation

data:  hdl_fit$residuals and mercury_fit$residuals
t = 3.7569, df = 1437, p-value = 0.0001789
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.04718758 0.14953581
sample estimates:
      cor 
0.0986225 


Residual correlation might be indicative of confounding bias

Multivariate Causal Inference and Unmeasured Confounding

The effect of Pollution on Birth Weight