Stephens’ Group Meeting
1/23/24
Algorithm 1 from (1)
log joint:
\[\begin{align} F(\beta_1, \dots, \beta_L) &= \log p(y | \sum_l(\psi_l)) + \sum_{l=1}^L\log p(\psi_l), \\ \psi_l &= X \beta_l, \\ \beta_l &\sim g_l \end{align}\]
ELBO:
\[\begin{align} \mathcal F(q_1, \dots, q_L) &= \mathbb E_q \left[ F(\beta_1, \dots \beta_L) \right] + \sum_l H(q_l) \\ q(\beta_1, \dots, \beta_L) &= \prod q_l(\beta_l) \end{align}\]
Exponential family \(f(y | \eta) \propto \exp \{y \eta - A(\eta) \}\), \(A\) convex. Additive model: \(\eta = \sum \eta_l\)
\[\begin{align} \mathbb E_q[\log f(y | \eta)] &= \mathbb E_{q_l} [\mathbb E_{q_{-l}}[\log f(y | \eta_l + \eta_{-l})] | \eta_l] \\ &\leq \mathbb E_{q_l} [\mathbb E_{q_{-l}}[\log f(y | \eta_l + \bar\eta_{-l}) + \nabla \log f_{\eta_{-l}}(\bar \eta_{-l})) (\eta_{-l} - \bar \eta_{-l})] | \eta_l] \\ &= \mathbb E_{q_l} [\log f(y | \eta_l + \bar\eta_{-l})] \\ \end{align}\]
Notes:
\[\begin{align} {\bf y} | \mu & \sim N(\mu, \sigma^2) \\ \mu &= \sum_l \mu_l \\ \mu_l &\sim g_l \end{align}\]
SuSiE is the special case where \(g_l \sim SER\)
(2)
Fig 1. reproduced from (3)
(1)
\(g\) the canonical link function, \(\mathbb E[y] = g^{-1}(\eta)\)
\[\begin{align} y | \eta &\sim p_{g^{-1}(\eta)} \\ \eta &= \sum_l \eta_l \\ \eta_l &\sim g_l \end{align}\]
(2)
log joint:
\[\begin{align} F(\beta_1, \dots, \beta_L) &= \log p(y | \sum_l(\psi_l)) + \sum_{l=1}^L\log p(\psi_l), \\ \psi_l &= X \beta_l, \\ \beta_l &\sim g_l \end{align}\]
ELBO:
\[\begin{align} \mathcal F(q_1, \dots, q_L) &= \mathbb E_q \left[ F(\beta_1, \dots \beta_L) \right] + \sum_l H(q_l) \\ q(\beta_1, \dots, \beta_L) &= \prod q_l(\beta_l) \end{align}\]
\[\begin{align} F(\beta_1, \dots, \beta_L) &= \log p(y | \sum_l(\psi_l)) + \sum_{l=1}^L\log p(\psi_l), \\ \end{align}\]
\[\begin{align} \beta_l^* = \arg\max_{\beta_l} F(\beta_1, \dots, \beta_L) \end{align}\]
Notes:
Figure 1: Wakefield’s ABF can be order of magnitude off when the \(z\)-score is large
\[ \text{ABF} = \sqrt{\frac{V+W}{V}} \exp (- \frac{z^2}{2} \frac{W}{V + W}) \]
\[\begin{align} \mathcal F(q_1, \dots, q_L) = \mathbb E_q \left[ F(\beta_1, \dots \beta_L) \right] + \sum_l H(q_l) \end{align}\]
Coordinate ascent on \((\beta_l)\). For SuSiE, at each step fit an SER, but only return the posterior mode of the SER, rather than the mean. Essentially stepwise selection, but reports a posterior distribution for each effect. ## Backfitting
Maximize the expected log likelihood
\[ \begin{align} \mathbb E[ l(\hat{\eta}(X), Y)] = \max_{\eta} \mathbb E[l(\eta(X), Y)] \end{align} \]
Most fine-mapping methods assume summary statistics from marginal association studies are normally distributed, with covariance determined by LD 1
\[\begin{align*} \hat{ {\bf z} } \sim N({\bf z}, R) \end{align*}\]
Statistical property of OLS– what if the marginal effects are coming from somewhere else?
Method | Notes | Summary stats |
---|---|---|
Generalized IBSS | “correct” model, hueristic algorithm | No |
Logistic + RSS | ad-hoc, actually used (8,9) | Yes |
Linear + RSS | mis-specified model, correct algorithm | Yes |
Logistic + RSS
Logistic GIBSS
An under-appreciated source of “LD mismatch”?
\(n = 500\), \(b_0 = -1\), \(b = 0, 1, 2, 3\) 1
Figure 2: Wakefield’s ABF can be order of magnitude off when the \(z\)-score is large
!()[resources/abf_biased.png]
!()[resources/abf_eq.png]
Simulation: one causal variant in the locus that explains \(1\%\) of heritability of liability. \(h^2 = 0.1, 0.2, 0.5, 0.9\)
\[\begin{align*} y \sim Bin(1, \sigma(\psi)) \\ \psi = b_0 + b x + \epsilon \\ \epsilon \sim N(0, \sigma^2) \end{align*}\]
95% C.I. for different \(h^2\)
\[ \begin{align*} y_i &\sim Bin\left(1, \sigma \left(b_0 + \sum_{j=1}^q b_j x_{ij} + \delta\right)\right)\\ b &\sim N(0, \sigma^2) \\ \delta &\sim N(0, \nu - q \sigma^2)\\ \end{align*} \]
Value | Description |
---|---|
\(X\) | Standardized genotypes |
\(\sigma^2\) | Variance of standardized effects, i.e. \(b \sim N(0, \sigma^2)\) |
\(q\) | Number of causal variants in locus |
\(\rho\) | Fraction of variance of genetic component in-locus |
\(k\) | Fraction of cases (determines \(b_0\)) |
\(q\sigma^2\) | (Expected) variance of genetic component in-locus |
\(\nu\) | \(q \sigma^2/\rho\), (expected) variance of genetic component |
\(h^2\) | \(\nu / (\nu + \pi^2/3)\), (expected) heritability of liability 1 |
(8), Alzheimers meta analysis combining linear and logistic association studies (9) logistic-mixed model SAIGE + SuSiE
A few options:
Limiting BF
Idea put a normal prior on all covariates \(\begin{bmatrix} \alpha \\ \beta \end{bmatrix} \sim N(\begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} I_{p-1} \tau_0^{-1} & 0 \\ 0 & \tau_1^{-1} \end{bmatrix}\) and compute Laplace approximation to the BF. Take \(\tau_0 \rightarrow 0+\).
Q: How variable is the scaling factor? Can we get away with just using the univariate BF?
Gauss-Hermite quadrature
\[ I = \int f(x) e^{-x^2} dx \approx \sum_{i=1}^n w_i f(x_i) \]
\((x_i)_{i=1}^n\) are the roots of the Hermite polynomial \(H_n(x)\), \(w_i = \frac{2^{n-1} n! \sqrt{\pi}}{n^2 H_{n-1}^2 (x_i)}\)
\[ I = \int f(x) dx = \int \left[\frac{f(x)}{q(x)} \right] q(x) dx, \;\; q(x) = N(x | \mu, \sigma^2)\; \text{s.t.}\; \frac{f}{q} \approx 1 \]
(note: change of variable + scaling factor to apply the \(n\) point Hermite quadrature rule)