GENetLib's documentation

GENetLib is a Python library for gene–environment interaction analysis via deep learning.

This Page

The ScalarGE Model

The ScalarGE model is designed to characterize gene-environment (G-E) interaction effects between high-dimensional (scalar) genetic variables and environmental variables.

Notation

Consider a dataset with \(n\) independent individuals.

Parameter

Notation

Outcome variables

\(\boldsymbol{y} = (y_1, \ldots, y_n)^{\top}\)

G variables

\(\boldsymbol{X}=(\boldsymbol{X}_1, \ldots, \boldsymbol{X}_n)^{\top}\) where \(\boldsymbol{X}_i=(X_{i1}, \dots, X_{ip})\)

E variables

\(\boldsymbol{Z} = (\boldsymbol{Z}_1, \ldots, \boldsymbol{Z}_n)^{\top}\) where \(\boldsymbol{Z}_i=(Z_{i1}, \dots, Z_{iq})\)

G-E interaction variables

\(\boldsymbol{V} = (\boldsymbol{V}_1, \ldots, \boldsymbol{V}_n)^\top\) where \(\boldsymbol{V}_i = \boldsymbol{Z}_i \otimes \boldsymbol{X}_i\)

Combination of G and G-E variables

\(\boldsymbol{H} = (\boldsymbol{X}, \boldsymbol{V})\)

Thus, we define \(\boldsymbol{W}=(\boldsymbol{X}, \boldsymbol{V}, \boldsymbol{Z})\) where \(\boldsymbol{W}_i\) is the variable set for the \(i\)-th individual.

Loss functions

We designed three types of output layers for different outcomes:

  • Continuous layer, suited for outcomes like disease risk.

  • Binary layer, tailored for binary outcomes such as treatment response.

  • Cox layer, applicable to survival outcomes such as time to an event.

Next we generate the loss functions for these three layers:

Continuous layer

\[l_{\text{continuous}}(\boldsymbol{\theta})=\frac{1}{n}\sum_{i=1}^n \left[ y_i-g(\boldsymbol{W}_i\boldsymbol{\theta})\right]^2\]

where \(\boldsymbol{\theta}\) represents the neural network weights, and \(g\) is the functional form.

Binary layer

\[l_{\text{binary}}(\boldsymbol{\theta}) = -\frac{1}{n} \sum_{i=1}^n \left[ y_i\log(p_i) + (1 - y_i) \log (1 - p_i)-\boldsymbol{W}_i \boldsymbol{\theta} \right]\]

where \(p_i = (1 + e^{-\boldsymbol{W}_i \boldsymbol{\theta}})^{-1}\) is the sigmoid function, and \(y_i\) represents the binary outcome.

Cox layer

\[l_{\text{Cox}}(\boldsymbol{\theta})=-\sum_{i:\delta_{i}=1}\biggl[h_{\boldsymbol{\theta}}(\boldsymbol{W}_i)-\log{\sum_{j\in R(T_{i})}e^{h_{\boldsymbol{\theta}}(\boldsymbol{W}_j)}}\biggr]\]

where \(y_i\) and \(c_i\) represent the survival time and censoring time, respectively. The minimum of \(y_i\) and \(c_i\) is denoted by \(T_i\), and the event indicator \(\delta_i\) is defined as \(I(y_i \leq c_i)\). The at-risk set at time \(T_i\) is \(R(T_i) = \{i' : T_{i'} \geq T_i\}\), and \(h_{\boldsymbol{\theta}}\) is the prognostic index output by the model.

Penalty approximation

In the sparse layer, define \(\gamma_j\) as the main effect and \(\beta_{1j}, \ldots, \beta_{qj}\) as the interaction effects. The coefficient vector associated with the corresponding G and G-E variables in \(H\) is denoted by \(\boldsymbol{b}_j = (\gamma_j, \beta_{1j}, \ldots, \beta_{qj})^{\top} = (b_{j1}, \ldots, b_{j(q+1)})^{\top}\), where \(j = 1, \ldots, p\).

Let \(K\) denote the total number of fully connected layers in the network, and \(\omega_k\) be the weight matrix for the \(k\)-th layer. The objective function is:

\[L(\boldsymbol{\theta}) = l_{\cdot}(\boldsymbol{\theta}) + \sum_{j=1}^p\rho(||\boldsymbol{b_j}||;\lambda_1,s) + \sum_{j=1}^p \sum_{k=1}^q \rho(|\beta_{kj}|; \lambda_2, s) + \lambda \biggl( \sum_{k=1}^K||\omega_k||_F^2 \biggr)\]

where \(l_{\cdot}(\boldsymbol{\theta})\) is the corresponding loss function depending on outcomes, \(|\cdot|\) denotes the \(L_2\)-norm, \(\rho(t; \lambda, s) = \lambda \int_0^{|t|} \left(1 - \frac{x}{\lambda s}\right)+ dx\) is the minimax concave penalty (MCP), and \(\lambda_1, \lambda_2 > 0\) are tuning parameters. \(|\cdot|_{F}\) denotes the Frobenius norm of a matrix, and \(\lambda > 0\) is another tuning parameter.

We set \(\lambda_1 = \sqrt{q + 1} \lambda_2\) and \(s = 3\), and use the local quadratic approximation (LQA) techniqueto optimize MCP penalties in \(l(\theta)\). The two MCP penalties can be approximated as:

\[\sum_{j=1}^p\rho(\|\boldsymbol{b}_j\|;\lambda_1,s) = \sum_{j=1}^{p}\frac{\rho_{\lambda_{1}}^{\prime}\left(\|\boldsymbol{b}_j^{(m)}\|\right)}{2\|\boldsymbol{b}_j^{(m)}\|}\frac{|\gamma_{j}^{(m)}|}{\|\boldsymbol{b}_j^{(m)}\|}\gamma_{j}^{2}\]
\[\sum_{j=1}^p\sum_{k=1}^q\rho(|\beta_{kj}|;\lambda_2,s)=\sum_{j=1}^p\sum_{k=1}^q\left[\frac{\rho_{\lambda_1}^{\prime}\left(\|\boldsymbol{b}_j^{(m)}\|\right)}{2\|\boldsymbol{b}_j^{(m)}\|}\frac{|\beta_{kj}^{(m)}|}{\|\boldsymbol{b}_j^{(m)}\|}+\frac{\rho_{\lambda_2}^{\prime}\left(|\beta_{kj}^{(m)}|\right)}{2|\beta_{kj}^{(m)}|}\right]\beta_{kj}^2\]

where \(m\) denotes the \(m\)-th iteration, and \(\rho_\lambda^{\prime}(t) = (\lambda - \frac{t}{s})_+\) is the first derivative of the MCP penalty.

We minimize the objective function, iterating the estimation process until the training converges and parameters stabilize folloing the algorithm below.

Algorithm: training of ScalarGE

Input:

  • Gene variables \(\boldsymbol{X}\) and environmental variables \(\boldsymbol{Z}\);

  • Survival output \((T,\delta)\) or continuous outputs \(y\) or binary output \(y\);

  • Learning rates of the sparse layer and the fully connected layers \(\{\alpha_1,\alpha_2\}\);

  • Tuning and regularization parameters of the MCP penalty \(\{\lambda_1, \lambda_2, s\}\);

  • Tuning parameter of the fully connected layers \(\lambda\).

Initialize:

  • Sparse layer \(\boldsymbol{b}^{(0)}\), \(k\)-th fully connected layer \(\omega_k^{(0)}\), \(m = 0\).

Repeat:

  • Update the approximated MCP penalties with the current estimate \(\boldsymbol{b}^{(m)}\);

  • Update Loss \(= l.(\boldsymbol{\theta}) + \text{approximated MCP penalties} + \lambda \sum_{k=1}^{K} \|\omega_{k}^{(m)}\|_F^2\);

  • Conduct back propagation, and obtain the gradients \(\frac{\partial \text{Loss}}{\partial \boldsymbol{b}_j^{(m)}}\) and \(\frac{\partial \text{Loss}}{\partial \omega_k^{(m)}}\);

  • For \(j = 1\) to \(p\) do

    • Update estimates \(\gamma_j^{(m+1)} = \gamma_j^{(m)} - \alpha_1 \frac{\partial \text{Loss}}{\partial \gamma_j^{(m)}}\);

    • For \(k = 1\) to \(q\) do

      • Update estimates \(\beta_{kj}^{(m+1)} = \beta_{kj}^{(m)} - \alpha_1 \frac{\partial \text{Loss}}{\partial \beta_{kj}^{(m)}}\);

    • End for;

  • End for;

  • For \(k = 1\) to \(K\) do

    • Update \(\omega_k^{(m+1)} = \omega_k^{(m)} - \alpha_2 \frac{\partial \text{Loss}}{\partial \omega_k^{(m)}}\);

  • End for;

  • Update \(m = m + 1\);

Until convergence or \(m\) reaches its maximum.

Previous: Methods | Next: The FuncGE Model