The ScalarGE Model¶

The ScalarGE model is designed to characterize gene-environment (G-E) interaction effects between high-dimensional (scalar) genetic variables and environmental variables.

Notation¶

Consider a dataset with \(n\) independent individuals.

Parameter	Notation
Outcome variables	\(\boldsymbol{y} = (y_1, \ldots, y_n)^{\top}\)
G variables	\(\boldsymbol{X}=(\boldsymbol{X}_1, \ldots, \boldsymbol{X}_n)^{\top}\) where \(\boldsymbol{X}_i=(X_{i1}, \dots, X_{ip})\)
E variables	\(\boldsymbol{Z} = (\boldsymbol{Z}_1, \ldots, \boldsymbol{Z}_n)^{\top}\) where \(\boldsymbol{Z}_i=(Z_{i1}, \dots, Z_{iq})\)
G-E interaction variables	\(\boldsymbol{V} = (\boldsymbol{V}_1, \ldots, \boldsymbol{V}_n)^\top\) where \(\boldsymbol{V}_i = \boldsymbol{Z}_i \otimes \boldsymbol{X}_i\)
Combination of G and G-E variables	\(\boldsymbol{H} = (\boldsymbol{X}, \boldsymbol{V})\)

Thus, we define \(\boldsymbol{W}=(\boldsymbol{X}, \boldsymbol{V}, \boldsymbol{Z})\) where \(\boldsymbol{W}_i\) is the variable set for the \(i\)-th individual.

Loss functions¶

We designed three types of output layers for different outcomes:

Continuous layer, suited for outcomes like disease risk.
Binary layer, tailored for binary outcomes such as treatment response.
Cox layer, applicable to survival outcomes such as time to an event.

Next we generate the loss functions for these three layers:

Continuous layer

\[l_{\text{continuous}}(\boldsymbol{\theta})=\frac{1}{n}\sum_{i=1}^n \left[ y_i-g(\boldsymbol{W}_i\boldsymbol{\theta})\right]^2\]

where \(\boldsymbol{\theta}\) represents the neural network weights, and \(g\) is the functional form.

Binary layer

\[l_{\text{binary}}(\boldsymbol{\theta}) = -\frac{1}{n} \sum_{i=1}^n \left[ y_i\log(p_i) + (1 - y_i) \log (1 - p_i)-\boldsymbol{W}_i \boldsymbol{\theta} \right]\]

where \(p_i = (1 + e^{-\boldsymbol{W}_i \boldsymbol{\theta}})^{-1}\) is the sigmoid function, and \(y_i\) represents the binary outcome.

Cox layer

\[l_{\text{Cox}}(\boldsymbol{\theta})=-\sum_{i:\delta_{i}=1}\biggl[h_{\boldsymbol{\theta}}(\boldsymbol{W}_i)-\log{\sum_{j\in R(T_{i})}e^{h_{\boldsymbol{\theta}}(\boldsymbol{W}_j)}}\biggr]\]

where \(y_i\) and \(c_i\) represent the survival time and censoring time, respectively. The minimum of \(y_i\) and \(c_i\) is denoted by \(T_i\), and the event indicator \(\delta_i\) is defined as \(I(y_i \leq c_i)\). The at-risk set at time \(T_i\) is \(R(T_i) = \{i' : T_{i'} \geq T_i\}\), and \(h_{\boldsymbol{\theta}}\) is the prognostic index output by the model.

Penalty approximation¶

In the sparse layer, define \(\gamma_j\) as the main effect and \(\beta_{1j}, \ldots, \beta_{qj}\) as the interaction effects. The coefficient vector associated with the corresponding G and G-E variables in \(H\) is denoted by \(\boldsymbol{b}_j = (\gamma_j, \beta_{1j}, \ldots, \beta_{qj})^{\top} = (b_{j1}, \ldots, b_{j(q+1)})^{\top}\), where \(j = 1, \ldots, p\).

Let \(K\) denote the total number of fully connected layers in the network, and \(\omega_k\) be the weight matrix for the \(k\)-th layer. The objective function is:

\[L(\boldsymbol{\theta}) = l_{\cdot}(\boldsymbol{\theta}) + \sum_{j=1}^p\rho(||\boldsymbol{b_j}||;\lambda_1,s) + \sum_{j=1}^p \sum_{k=1}^q \rho(|\beta_{kj}|; \lambda_2, s) + \lambda \biggl( \sum_{k=1}^K||\omega_k||_F^2 \biggr)\]

where \(l_{\cdot}(\boldsymbol{\theta})\) is the corresponding loss function depending on outcomes, \(|\cdot|\) denotes the \(L_2\)-norm, \(\rho(t; \lambda, s) = \lambda \int_0^{|t|} \left(1 - \frac{x}{\lambda s}\right)+ dx\) is the minimax concave penalty (MCP), and \(\lambda_1, \lambda_2 > 0\) are tuning parameters. \(|\cdot|_{F}\) denotes the Frobenius norm of a matrix, and \(\lambda > 0\) is another tuning parameter.

We set \(\lambda_1 = \sqrt{q + 1} \lambda_2\) and \(s = 3\), and use the local quadratic approximation (LQA) techniqueto optimize MCP penalties in \(l(\theta)\). The two MCP penalties can be approximated as:

\[\sum_{j=1}^p\rho(\|\boldsymbol{b}_j\|;\lambda_1,s) = \sum_{j=1}^{p}\frac{\rho_{\lambda_{1}}^{\prime}\left(\|\boldsymbol{b}_j^{(m)}\|\right)}{2\|\boldsymbol{b}_j^{(m)}\|}\frac{|\gamma_{j}^{(m)}|}{\|\boldsymbol{b}_j^{(m)}\|}\gamma_{j}^{2}\]

\[\sum_{j=1}^p\sum_{k=1}^q\rho(|\beta_{kj}|;\lambda_2,s)=\sum_{j=1}^p\sum_{k=1}^q\left[\frac{\rho_{\lambda_1}^{\prime}\left(\|\boldsymbol{b}_j^{(m)}\|\right)}{2\|\boldsymbol{b}_j^{(m)}\|}\frac{|\beta_{kj}^{(m)}|}{\|\boldsymbol{b}_j^{(m)}\|}+\frac{\rho_{\lambda_2}^{\prime}\left(|\beta_{kj}^{(m)}|\right)}{2|\beta_{kj}^{(m)}|}\right]\beta_{kj}^2\]

where \(m\) denotes the \(m\)-th iteration, and \(\rho_\lambda^{\prime}(t) = (\lambda - \frac{t}{s})_+\) is the first derivative of the MCP penalty.

We minimize the objective function, iterating the estimation process until the training converges and parameters stabilize folloing the algorithm below.

Algorithm: training of ScalarGE¶

Input:

Gene variables \(\boldsymbol{X}\) and environmental variables \(\boldsymbol{Z}\);
Survival output \((T,\delta)\) or continuous outputs \(y\) or binary output \(y\);
Learning rates of the sparse layer and the fully connected layers \(\{\alpha_1,\alpha_2\}\);
Tuning and regularization parameters of the MCP penalty \(\{\lambda_1, \lambda_2, s\}\);
Tuning parameter of the fully connected layers \(\lambda\).

Initialize:

Sparse layer \(\boldsymbol{b}^{(0)}\), \(k\)-th fully connected layer \(\omega_k^{(0)}\), \(m = 0\).

Repeat:

Update the approximated MCP penalties with the current estimate \(\boldsymbol{b}^{(m)}\);
Update Loss \(= l.(\boldsymbol{\theta}) + \text{approximated MCP penalties} + \lambda \sum_{k=1}^{K} \|\omega_{k}^{(m)}\|_F^2\);
Conduct back propagation, and obtain the gradients \(\frac{\partial \text{Loss}}{\partial \boldsymbol{b}_j^{(m)}}\) and \(\frac{\partial \text{Loss}}{\partial \omega_k^{(m)}}\);
For \(j = 1\) to \(p\) do
- Update estimates \(\gamma_j^{(m+1)} = \gamma_j^{(m)} - \alpha_1 \frac{\partial \text{Loss}}{\partial \gamma_j^{(m)}}\);
- For \(k = 1\) to \(q\) do
  - Update estimates \(\beta_{kj}^{(m+1)} = \beta_{kj}^{(m)} - \alpha_1 \frac{\partial \text{Loss}}{\partial \beta_{kj}^{(m)}}\);
- End for;
End for;
For \(k = 1\) to \(K\) do
- Update \(\omega_k^{(m+1)} = \omega_k^{(m)} - \alpha_2 \frac{\partial \text{Loss}}{\partial \omega_k^{(m)}}\);
End for;
Update \(m = m + 1\);

Until convergence or \(m\) reaches its maximum.

Previous: Methods | Next: The FuncGE Model

On this page

The ScalarGE Model

GENetLib's documentation

This Page

Quick search

The ScalarGE Model¶

Notation¶

Loss functions¶

Penalty approximation¶

Algorithm: training of ScalarGE¶