Sensitivity to Unobserved Confounding in Studies with Factor-structured Outcomes

Alexander Franks

9/28/23

Slides and Paper

  • Slides: afranks.com/talks

  • Sensitivity to Unobserved Confounding in Studies with Factor-structured Outcomes, (JASA, 2023) https://arxiv.org/abs/2208.06552

  • Joint work with Jiajing Zheng (formerly UCSB), Jiaxi Wu (UCSB) and Alex D’Amour (Google)

Causal Inference From Observational Data

  • Consider a treatment \(T\) and outcome \(Y\)

  • Interested in the population average treatment effect (PATE) of \(T\) on \(Y\): \[E[Y | do(T=t)] - E[Y | do(T=t')]\]

  • In general, the PATE is not the same as \[E[Y | T=t] - E[Y | T=t']\]

Confounders

Need to control for \(U\) to consistently estimate the causal effect

Confounding bias

  • Observed data regression of \(T\) on \(Y\) fails because the distribution of \(U\) varies in the two treatment arms
  • We try to condition on as many observed confounders as possible to mitigate potential confounding bias
  • Commonly assumed that there are “no unobserved confounders” (NUC) but this is unverifiable
  • Sensitivity analysis is a tool for assessing the impacts of violations of this assumption

A Motivating Example

A Motivating Example

The Effects of Light Alcohol Consumption

  • Observational data from the National Health and Nutrition Examination Study (NHANES) on alcohol consumption.

  • Light alcohol consumption is positively correlated with blood levels of HDL (“good cholesterol”)

  • Define “light alcohol consumption’’ as 1-2 alcoholic beverages per day

  • Non-drinkers: self-reported drinking of one drink a week or less

  • Control for age, gender and indicator for educational attainment

HDL and alcohol consumption

summary(lm(Y[, "HDL"] ~ drinking + X))

Call:
lm(formula = Y[, "HDL"] ~ drinking + X)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.0855 -0.6127 -0.0512  0.6389  4.2383 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.225550   0.091105   2.476 0.013412 *  
drinking    0.597399   0.091917   6.499 1.11e-10 ***
Xage        0.006409   0.001452   4.415 1.09e-05 ***
Xgender     0.689557   0.049426  13.951  < 2e-16 ***
Xeduc       0.194338   0.051161   3.799 0.000152 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9216 on 1434 degrees of freedom
Multiple R-squared:  0.1531,    Adjusted R-squared:  0.1507 
F-statistic: 64.81 on 4 and 1434 DF,  p-value: < 2.2e-16


What must be true for this correlation to be non-causal?

Blood mercury and alcohol consumption

summary(lm(Y[, "Methylmercury"] ~ drinking + X))

Call:
lm(formula = Y[, "Methylmercury"] ~ drinking + X)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.3570 -0.7363 -0.0728  0.6242  4.1127 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.442044   0.096385   4.586 4.91e-06 ***
drinking     0.364096   0.097244   3.744 0.000188 ***
Xage         0.008186   0.001536   5.330 1.14e-07 ***
Xgender     -0.062664   0.052290  -1.198 0.230966    
Xeduc        0.269815   0.054126   4.985 6.95e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.975 on 1434 degrees of freedom
Multiple R-squared:  0.05209,   Adjusted R-squared:  0.04945 
F-statistic:  19.7 on 4 and 1434 DF,  p-value: 8.41e-16


But… no plausible causal mechanism in this case

Residual Correlation

hdl_fit <- lm(Y[, "HDL"] ~ drinking + X)
mercury_fit <- lm(Y[, "Methylmercury"] ~ drinking + X)

cor.test(hdl_fit$residuals, mercury_fit$residuals)

    Pearson's product-moment correlation

data:  hdl_fit$residuals and mercury_fit$residuals
t = 3.7569, df = 1437, p-value = 0.0001789
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.04718758 0.14953581
sample estimates:
      cor 
0.0986225 


Residual correlation might be indicative of confounding bias

Sensitivity Analysis

  • NUC unlikely to hold exactly. What then?

  • Calibrate assumptions about confounding to explore range of causal effects that are plausible

  • Robustness: quantify how “strong” confounding has to be to nullify causal effect estimates

  • Well established methods for single outcome analyses

Multi-outcome Sensitivity Analysis

  • If we measure multiple outcomes, is there prior knowledge that we can leverage to strengthen causal conclusions?
  • What might residual correlation in multi-outcome models mean for potential for confounding?
  • How do results change when we assume a priori that certain outcomes cannot be affected by treatments?

    • Null control outcomes (e.g. alcohol consumption should not increase mercury levels)

Standard Assumptions

Assumption (Latent Ignorability)

U and X block all backdoor paths between T and Y (Pearl 2009)

Assumption (Latent positivity)

f(T = t | U = u, X = x) > 0 for all u and x

Assumption (SUTVA)

There are no hidden versions of the treatment and there is no interference between units

Single-outcome Sensitivity Analysis

Assume the outcome is linear in the treatment and confounders (no interactions). Then the squared omitted variable bias for the PATE is \[\text{Bias}_{t_1,t_2}^2 \, \leq \, \frac{(t_1-t_2)^2}{\sigma_{t\mid x}^2} \left(\frac{R^2_{T\sim U|X}}{1 - R^2_{T\sim U|X}} \right)R^2_{Y \sim U | T, X}\]

Single-outcome Sensitivity Analysis

Assume the outcome is linear in the treatment and confounders (no interactions). Then the squared omitted variable bias for the PATE is \[\text{Bias}_{t_1,t_2}^2 \, \leq \, \frac{(t_1-t_2)^2}{\sigma_{t\mid x}^2} {\color{#C43424}\left(\frac{R^2_{T\sim U|X}}{1 - R^2_{T\sim U|X}} \right)}R^2_{Y \sim U | T, X}\]

  • \(R^2_{T\sim U|X}\): partial fraction of treatment variance explained by confounders (given observed covariates)

Single-outcome Sensitivity Analysis

Assume the outcome is linear in the treatment and confounders (no interactions). Then the squared omitted variable bias for the PATE is \[\text{Bias}_{t_1,t_2}^2 \, \leq \, \frac{(t_1-t_2)^2}{\sigma_{t\mid x}^2} \left(\frac{R^2_{T\sim U|X}}{1 - R^2_{T\sim U|X}} \right){\color{#C43424}R^2_{Y \sim U | T, X}}\]

  • \(R^2_{T\sim U|X}\): partial fraction of treatment variance explained by confounders (given observed covariates)

  • \(R^2_{Y\sim U|T,X}\): partial fraction of outcome variance explained by confounders (given observed covariates and treatment)

Robustness

  • How big do \(R^2_{T\sim U |X}\) and \(R^2_{Y \sim U | T, X}\) need to be to nullify the effect?
  • \(RV^1\) smallest value of \(R^2_{T\sim U |X} = R^2_{Y \sim U | T, X}\) needed to nullify effect (Cinelli and Hazlett 2020)
  • \(XRV\) smallest value of \(R^2_{T\sim U |X}\) assuming \(R^2_{Y \sim U | T, X}=1\) needed to nullify effect (Cinelli and Hazlett 2022)

Calibrating Sensitivity Parameters

  • What values of \(R^2_{Y\sim U|T, X}\) and \(R^2_{T \sim U | X}\) might be reasonable?

  • Can use observed covariates to generate benchmark values:

    • Compute \(R^2_{T \sim X_{j} | X_{-j}}\) for all covariate \(X_j\)

    • Compute \(R^2_{Y \sim X_{j} | X_{-j}, T}\) for all covariate \(X_j\)

  • Use domain knowledge to reason about most important confounders

Sensitivity of HDL Cholesterol Effect

From the sensemakr documentation (Cinelli, Ferwerda, and Hazlett 2020)

Models with factor-structured residuals


Assume the observed data mean and covariance can be expressed as follows: \[\begin{align} E[Y \mid T = t, X=x] &= \check g(t, x)\\ Cov(Y \mid T = t, X = x) &= \Gamma\Gamma' + \Lambda, \end{align}\]

  • \(\Gamma\) are factor loading matrices, \(\Lambda\) is diagonal
  • \(\check g(T=t, X) - \check g(T=t', X)\) is only the PATE when there is NUC

A Structural Equation Model

  • \(U\) (m-vector) and \(X\) are possible causes for \(T\) (scalar) and \(Y\) (q-vector)

  • \(X\) are observed but \(U\) are not.

\[\begin{align} & U = \epsilon_U \label{eqn:u}\\ &T = f_{\epsilon}(X, U) \label{eqn:treatment_general,multi-y}\\ &Y = g(T,X) + \Gamma\Sigma_{u|t,x}^{-1/2}U + \epsilon_{y}, \label{eqn:epsilon_y} \end{align}\]

  • This SEM is compatible the factor structured residuals, \(Cov(Y|T, X) = \Gamma\Gamma' + \Lambda\)

A Structural Equation Model

\[\begin{align} &U = \epsilon_U\\ &T = f_\epsilon(X,U)\\ &Y = g(T, X) + \Gamma\Sigma_{u|t,x}^{-1/2}U + \epsilon_{y} \end{align}\]

  • Confounding bias is \(\Gamma\Sigma_{u|t,x}^{-1/2}\mu_{u \mid t,x}\)

  • \(\mu_{u \mid t,x}\) and \(\Sigma_{u|t,x}\) are the conditional mean and covariance of the unmeasured confounders

    • User specified sensitivity parameters

A Sensitivity Specification

  • Interpretable specification for \(\mu_{u \mid t,x}\) and \(\Sigma_{u|t,x}\) parameterized by a single \(m\)-vector, \(\rho\):

\[\begin{align} \mu_{u\mid t,x} &= \frac{\rho}{\sigma_{t \mid x}^{2}}\left(t-\mu_{t\mid x}\right) \label{eqn:conditional_u_mean}, \\ \Sigma_{u \mid t,x} &= I_m-\frac{\rho \rho^{\prime}}{\sigma_{t\mid x}^{2}} \label{eqn:conditional_u_cov}, \end{align}\]

  • \(\rho\) is the partial correlation vector between \(T\) and \(U\)

  • Define \(0 \leq R^2_{T \sim U |X}:= \frac{||\rho||^2_2}{\sigma^2_{t\mid x}} < 1\) to be the squared norm of the partial correlation between T and U given \(X\)

Multi-Outcome Assumptions

Assumption (Homoscedasticity)

\(Cov(Y |T = t, X = x)\) is invariant to t and x. Implies factor loadings, \(\Gamma\), are invariant to \(t\) and \(x\)

Assumption (Factor confounding)

The factor loadings, \(\Gamma\), are identifiable (up to rotation) and reflect all potential confounders. (Anderson and Rubin 1956)

To identify factor loadings, \(\Gamma\), \((q-m)^2-q-m\geq0\) and each confounder must influence at least three outcomes

Bounding the Omitted Variable Bias

Theorem (Bounding the bias for outcome \(Y_j\))

Given the structural equation model, sensitivity specification and given assumptions, the squared omitted variable bias for the PATE of outcome \(Y_j\) is bounded by \[\text{Bias}_{j}^2 \, \leq \, \frac{(t_1-t_2)^2}{\sigma_{t\mid x}^2} \left(\frac{R^2_{T\sim U|X}}{1 - R^2_{T\sim U|X}} \right)\parallel \Gamma_j\parallel_2^2\]

  • The bound on the bias for outcome \(j\) is proportional to the norm of the factor loadings for that outcome

  • A single sensitivity parameter, \(R^2_{T \sim U \mid X}\), shared across all outcomes

Bounding the Omitted Variable Bias

Theorem (Bounding the bias for outcome \(Y_j\))

Given the structural equation model, sensitivity specification and given assumptions, the squared omitted variable bias for the PATE of outcome \(Y_j\) is bounded by \[\text{Bias}_{j}^2 \, \leq \, \frac{(t_1-t_2)^2}{\sigma_{t\mid x}^2} \left(\frac{R^2_{T\sim U|X}}{1 - R^2_{T\sim U|X}} \right){\color{#C43424} \parallel \Gamma_j\parallel_2^2}\]

  • The bound on the bias for outcome \(j\) is proportional to the norm of the factor loadings for that outcome

  • A single sensitivity parameter, \(R^2_{T \sim U \mid X}\), shared across all outcomes

Bounding the Omitted Variable Bias

Theorem (Bounding the bias for outcome \(Y_j\))

Given the structural equation model, sensitivity specification and given assumptions, the squared omitted variable bias for the PATE of outcome \(Y_j\) is bounded by \[\text{Bias}_{j}^2 \, \leq \, \frac{(t_1-t_2)^2}{\sigma_{t\mid x}^2} {\color{#C43424}\left(\frac{R^2_{T\sim U|X}}{1 - R^2_{T\sim U|X}} \right)} \parallel \Gamma_j\parallel_2^2\]

  • The bound on the bias for outcome \(j\) is proportional to the norm of the factor loadings for that outcome

  • A single sensitivity parameter, \(R^2_{T \sim U \mid X}\), shared across all outcomes

Reparametrizing \(R^2_{T \sim U | X}\) for binary treatments

  • \(R^2_{T \sim U | X}\) is unnatural for binary treatments

  • \(\Lambda\)-parameterization \(\leftrightarrow\) \(R^2_{T \sim U | X}\)-parameterization

Fix a \(\Lambda_\alpha\) such that \[Pr\left(\Lambda_\alpha^{-1} \leq \frac{e_0(X, U)/(1-e_0(X, U))}{e(X)/(1-e(X))}\leq \Lambda_\alpha\right)=1-\alpha\]

  • Related to the marginal sensitivity model (Tan 2006)

Null Control Outcomes

  • Assume we have null control outcomes, \(\mathcal{C}\)
  • \(\check \tau\) are the vector of PATEs under NUC
  • \(\Gamma_{\mathcal{C}}\) are the factor loadings for the null control outcomes, \(\mathcal{C}\)
  • Need at least \(R^2_{T \sim U \mid X}\geq R^2_{min}\) of the treatment variance to be due to confounding to nullify the null controls

Null Control Outcomes

Theorem (Bias with Null Control Outcomes)

Assume the previous structural equation model and sensitivity specification. Then the squared omitted variable bias for the PATE of outcome \(Y_j\) is bounded by

\[\begin{equation} \label{eqn:ignorance-region-gaussian-wnc,multi-y} \text{Bias}_j \in \left[\Gamma_j \Gamma_{\mathcal{C}}^{\dagger} \check{\tau}_{\mathcal{C}} \; \pm \; \parallel \Gamma_j P_{\Gamma_{\mathcal{C}}}^{\perp} \parallel_2 \sqrt{ \frac{1}{\sigma_{t\mid x}^2}\left( \frac{R^2_{T \sim U | X}}{1 - R^2_{T \sim U | X}} - \frac{R^2_{min}}{1 - R^2_{min}} \right)} \right], \end{equation}\]

Null Control Outcomes

Theorem (Bias with Null Control Outcomes)

Assume the previous structural equation model and sensitivity specification. Then the squared omitted variable bias for the PATE of outcome \(Y_j\) is bounded by

\[\begin{equation} \text{Bias}_j \in \left[{\color{#C43424}\Gamma_j \Gamma_{\mathcal{C}}^{\dagger} \check{\tau}_{\mathcal{C}}} \; \pm \; \parallel \Gamma_j P_{\Gamma_{\mathcal{C}}}^{\perp} \parallel_2 \sqrt{ \frac{1}{\sigma_{t\mid x}^2}\left( \frac{R^2_{T \sim U | X}}{1 - R^2_{T \sim U | X}} - \frac{R^2_{min}}{1 - R^2_{min}} \right)} \right], \end{equation}\]

  • \(\Gamma_j\Gamma_{\mathcal{C}}^{\dagger}\check \tau_{\mathcal{C}}\) is a (partial) bias correction for outcome \(j\)

Null Control Outcomes

Theorem (Bias with Null Control Outcomes)

Assume the previous structural equation model and sensitivity specification. Then the squared omitted variable bias for the PATE of outcome \(Y_j\) is bounded by

\[\begin{equation} \text{Bias}_j \in \left[\Gamma_j \Gamma_{\mathcal{C}}^{\dagger} \check{\tau}_{\mathcal{C}} \; \pm \; \parallel \Gamma_j P_{\Gamma_{\mathcal{C}}}^{\perp} \parallel_2 \sqrt{ \frac{1}{\sigma_{t\mid x}^2}\color{#C43424}{\left( \frac{R^2_{T \sim U | X}}{1 - R^2_{T \sim U | X}} - \frac{R^2_{min}}{1 - R^2_{min}} \right)}} \right], \end{equation}\]

  • If \(R^2_{T \sim U | X}=R^2_{min}\) then the bias is identified for all outcomes

Null Control Outcomes

Theorem (Bias with Null Control Outcomes)

Assume the previous structural equation model and sensitivity specification. Then the squared omitted variable bias for the PATE of outcome \(Y_j\) is bounded by

\[\begin{equation} \text{Bias}_j \in \left[\Gamma_j \Gamma_{\mathcal{C}}^{\dagger} \check{\tau}_{\mathcal{C}} \; \pm \; {\color{#C43424}\parallel \Gamma_j P_{\Gamma_{\mathcal{C}}}^{\perp} \parallel_2} \sqrt{ \frac{1}{\sigma_{t\mid x}^2}\left( \frac{R^2_{T \sim U | X}}{1 - R^2_{T \sim U | X}} - \frac{R^2_{min}}{1 - R^2_{min}} \right)} \right], \end{equation}\]

  • Ignorance about the bias is smallest when \(\Gamma_j\) is close to the span of \(\Gamma_{\mathcal{C}}\), that is, when \(\parallel \Gamma_j P_{\Gamma_{\mathcal{C}}}^{\perp} \parallel_2\) is small

Robustness under Factor Confounding

  • \(RV^\Gamma_j\) smallest value of \(R^2_{T\sim U |X}\) needed to nullify the effect for outcome \(j\) under factor confounding
  • \(RV^\Gamma_j\) can be smaller or larger than \(RV^1\)

  • \(RV_j^{\Gamma} \geq XRV\) by definition

  • \(RV^\Gamma_{j, NC}\) smallest value of \(R^2_{T\sim U |X}\) needed to nullify the effect for outcome \(j\) and the assumed null controls

Simulation Study

  • Gaussian data generating process \[\begin{align} T &= \beta' U + \epsilon_T \\ Y_j &= \tau_jT + \Gamma'\Sigma^{-1/2}_{u|t}U + \epsilon_y \end{align}\]

  • \(R^2_{T \sim U \mid X}=0.5\) from \(m=2\) unmeasured confounders

  • \(\tau_j = 0\) for \(Y_1\), \(Y_2\) and \(Y_{10}\)

  • \(\tau_j=1\) for all outher outcomes

Simulation Study

  • Fit a Bayesian linear regression on the 10 outcomes given then treatment

  • Assume a residual covariance with a rank-two factor structure

  • Plot ignorance regions assuming \(R^2_{T \sim U} \leq 0.5\)

  • Plot ignorance regions assuming \(R^2_{T \sim U} \leq 0.5\) and \(Y_1\) is null

Simulation Study

The effects of light drinking

  • Measure ten different outcomes from blood samples:
    • natural: HDL, LDL, triglycerides, potassium, iron, sodium, glucose
    • environmental toxicants: mercury, lead, cadmium.
  • Measured confounders: age, gender and indicator for highest educational attainment
  • Residual correlation in the outcomes might be indicative of additional confounding bias

The effects of light drinking

Model: \[\begin{align} Y &\sim N(\tau T + \alpha 'X, \Gamma\Gamma' + \Lambda) \end{align}\]

  • \(E[Y| T, X, U] = \tau T + \alpha 'X + \Gamma'\Sigma^{-1/2}_{u|t}U\)

  • Residuals are approximately Gaussian

  • Fit a multivariate Bayesian linear regression with factor structured residuals on all outcomes

  • Consider posterior distribution of \(\tau\) under different assumptions about \(R^2_{T\sim U|X}\) and null controls

Benchmark Values

  • Use age, gender and an indicator of educational attainment to benchmark

  • \(\frac{1}{3.5} \leq \text{Odds}(X)/\text{Odds}(X_{-age}) \leq 3.5\) for 95% of observed values

  • For gender and education indicators the odds change was between \(\frac{1}{1.5}\) and \(1.5\)

  • Assume light drinking has no effect on methylmercury levels

Results: NHANES alcohol study



Takeaways

  • Prior knowledge unique to the multi-outcome setting can help inform assumptions about confounding

  • Sharper sensitivity analysis, when assumptions hold

  • Negative control assumptions can potentially provide strong evidence for or against robustness

Future directions

  • Identification with multiple treatments multiple outcomes
    • Collaboration on effects of pollutants on multiple heath outcomes
  • Sensitivity analysis for more general models / forms of dependence.

References

Anderson, Theodore W., and Herman Rubin. 1956. “STATISTICAL INFERENCE IN FACTOR ANALYSIS.” In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Volume 5: Contributions to Econometrics, Industrial Research, and Psychometry, 3.5:111–50. University of California Press.
Cinelli, Carlos, Jeremy Ferwerda, and Chad Hazlett. 2020. “Sensemakr: Sensitivity Analysis Tools for OLS in r and Stata.” Submitted to the Journal of Statistical Software.
Cinelli, Carlos, and Chad Hazlett. 2020. “Making Sense of Sensitivity: Extending Omitted Variable Bias.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (1): 39–67.
———. 2022. “An Omitted Variable Bias Framework for Sensitivity Analysis of Instrumental Variables.” Available at SSRN 4217915.
Pearl, Judea. 2009. Causality. Cambridge university press.
Tan, Zhiqiang. 2006. “A Distributional Approach for Causal Inference Using Propensity Scores.” Journal of the American Statistical Association 101 (476): 1619–37.

Thanks!

Jiaxi Wu (UCSB) Jiajing Zheng (formerly UCSB) Alex Damour (Google Research)

  • Jiaxi Wu (top, UCSB)

  • Jiajing Zheng (middle, formerly UCSB)

  • Alex D’Amour (bottom, Google Research)

Sensitivity to Unobserved Confounding in Studies with Factor-structured Outcomes https://arxiv.org/abs/2208.06552