The ScalarGE Model
=========================

.. _scalargemodel-label:

The ScalarGE model is designed to characterize gene-environment (G-E) interaction effects between high-dimensional (scalar) genetic variables and environmental variables.


Notation
----------

Consider a dataset with :math:`n` independent individuals.

.. list-table:: 
   :widths: 30 70
   :header-rows: 1
   :align: center

   * - Parameter
     - Notation
   * - Outcome variables
     - :math:`\boldsymbol{y} = (y_1, \ldots, y_n)^{\top}`
   * - G variables
     - :math:`\boldsymbol{X}=(\boldsymbol{X}_1, \ldots, \boldsymbol{X}_n)^{\top}` where :math:`\boldsymbol{X}_i=(X_{i1}, \dots, X_{ip})`
   * - E variables
     - :math:`\boldsymbol{Z} = (\boldsymbol{Z}_1, \ldots, \boldsymbol{Z}_n)^{\top}` where :math:`\boldsymbol{Z}_i=(Z_{i1}, \dots, Z_{iq})`
   * - G-E interaction variables
     - :math:`\boldsymbol{V} = (\boldsymbol{V}_1, \ldots, \boldsymbol{V}_n)^\top` where :math:`\boldsymbol{V}_i = \boldsymbol{Z}_i \otimes \boldsymbol{X}_i`
   * - Combination of G and G-E variables
     - :math:`\boldsymbol{H} = (\boldsymbol{X}, \boldsymbol{V})`

Thus, we define :math:`\boldsymbol{W}=(\boldsymbol{X}, \boldsymbol{V}, \boldsymbol{Z})` where :math:`\boldsymbol{W}_i` is the variable set for the :math:`i`-th individual.


Loss functions
---------------

We designed three types of output layers for different outcomes:

- Continuous layer, suited for outcomes like disease risk.

- Binary layer, tailored for binary outcomes such as treatment response.

- Cox layer, applicable to survival outcomes such as time to an event.

Next we generate the loss functions for these three layers:

**Continuous layer**

.. math::
    l_{\text{continuous}}(\boldsymbol{\theta})=\frac{1}{n}\sum_{i=1}^n \left[ y_i-g(\boldsymbol{W}_i\boldsymbol{\theta})\right]^2

where :math:`\boldsymbol{\theta}` represents the neural network weights, and :math:`g` is the functional form.

**Binary layer**

.. math::
    l_{\text{binary}}(\boldsymbol{\theta}) = -\frac{1}{n} \sum_{i=1}^n \left[ y_i\log(p_i) + (1 - y_i) \log (1 - p_i)-\boldsymbol{W}_i \boldsymbol{\theta} \right]

where :math:`p_i = (1 + e^{-\boldsymbol{W}_i \boldsymbol{\theta}})^{-1}` is the sigmoid function, and :math:`y_i` represents the binary outcome.

**Cox layer**

.. math::
    l_{\text{Cox}}(\boldsymbol{\theta})=-\sum_{i:\delta_{i}=1}\biggl[h_{\boldsymbol{\theta}}(\boldsymbol{W}_i)-\log{\sum_{j\in R(T_{i})}e^{h_{\boldsymbol{\theta}}(\boldsymbol{W}_j)}}\biggr]

where :math:`y_i` and :math:`c_i` represent the survival time and censoring time, respectively. The minimum of :math:`y_i` and :math:`c_i` is denoted by :math:`T_i`, and the event indicator :math:`\delta_i` is defined as :math:`I(y_i \leq c_i)`.
The at-risk set at time :math:`T_i` is :math:`R(T_i) = \{i' : T_{i'} \geq T_i\}`, and :math:`h_{\boldsymbol{\theta}}` is the prognostic index output by the model.


Penalty approximation
------------------------

In the sparse layer, define :math:`\gamma_j` as the main effect and :math:`\beta_{1j}, \ldots, \beta_{qj}` as the interaction effects.
The coefficient vector associated with the corresponding G and G-E variables in :math:`H` is denoted by :math:`\boldsymbol{b}_j = (\gamma_j, \beta_{1j}, \ldots, \beta_{qj})^{\top} = (b_{j1}, \ldots, b_{j(q+1)})^{\top}`, where :math:`j = 1, \ldots, p`.

Let :math:`K` denote the total number of fully connected layers in the network, and :math:`\omega_k` be the weight matrix for the :math:`k`-th layer. The objective function is:

.. math::
    L(\boldsymbol{\theta}) = l_{\cdot}(\boldsymbol{\theta}) + \sum_{j=1}^p\rho(||\boldsymbol{b_j}||;\lambda_1,s) + \sum_{j=1}^p \sum_{k=1}^q \rho(|\beta_{kj}|; \lambda_2, s) + \lambda \biggl( \sum_{k=1}^K||\omega_k||_F^2 \biggr)

where :math:`l_{\cdot}(\boldsymbol{\theta})` is the corresponding loss function depending on outcomes, :math:`|\cdot|` denotes the :math:`L_2`-norm, :math:`\rho(t; \lambda, s) = \lambda \int_0^{|t|} \left(1 - \frac{x}{\lambda s}\right)+ dx`
is the minimax concave penalty (MCP), and :math:`\lambda_1, \lambda_2 > 0` are tuning parameters. :math:`|\cdot|_{F}` denotes the Frobenius norm of a matrix, and :math:`\lambda > 0` is another tuning parameter.

We set :math:`\lambda_1 = \sqrt{q + 1} \lambda_2` and :math:`s = 3`, and use the local quadratic approximation (LQA) techniqueto optimize MCP penalties in :math:`l(\theta)`.
The two MCP penalties can be approximated as:

.. math::
    \sum_{j=1}^p\rho(\|\boldsymbol{b}_j\|;\lambda_1,s) = \sum_{j=1}^{p}\frac{\rho_{\lambda_{1}}^{\prime}\left(\|\boldsymbol{b}_j^{(m)}\|\right)}{2\|\boldsymbol{b}_j^{(m)}\|}\frac{|\gamma_{j}^{(m)}|}{\|\boldsymbol{b}_j^{(m)}\|}\gamma_{j}^{2}

.. math::
    \sum_{j=1}^p\sum_{k=1}^q\rho(|\beta_{kj}|;\lambda_2,s)=\sum_{j=1}^p\sum_{k=1}^q\left[\frac{\rho_{\lambda_1}^{\prime}\left(\|\boldsymbol{b}_j^{(m)}\|\right)}{2\|\boldsymbol{b}_j^{(m)}\|}\frac{|\beta_{kj}^{(m)}|}{\|\boldsymbol{b}_j^{(m)}\|}+\frac{\rho_{\lambda_2}^{\prime}\left(|\beta_{kj}^{(m)}|\right)}{2|\beta_{kj}^{(m)}|}\right]\beta_{kj}^2

where :math:`m` denotes the :math:`m`-th iteration, and :math:`\rho_\lambda^{\prime}(t) = (\lambda - \frac{t}{s})_+` is the first derivative of the MCP penalty.

We minimize the objective function, iterating the estimation process until the training converges and parameters stabilize folloing the algorithm below.


Algorithm: training of ScalarGE
-------------------------------

**Input**:

- Gene variables :math:`\boldsymbol{X}` and environmental variables :math:`\boldsymbol{Z}`;

- Survival output :math:`(T,\delta)` or continuous outputs :math:`y` or binary output :math:`y`;

- Learning rates of the sparse layer and the fully connected layers :math:`\{\alpha_1,\alpha_2\}`;

- Tuning and regularization parameters of the MCP penalty :math:`\{\lambda_1, \lambda_2, s\}`;

- Tuning parameter of the fully connected layers :math:`\lambda`.

**Initialize**:

- Sparse layer :math:`\boldsymbol{b}^{(0)}`, :math:`k`-th fully connected layer :math:`\omega_k^{(0)}`, :math:`m = 0`.

**Repeat**:

- Update the approximated MCP penalties with the current estimate :math:`\boldsymbol{b}^{(m)}`;

- Update Loss :math:`= l.(\boldsymbol{\theta}) + \text{approximated MCP penalties} + \lambda \sum_{k=1}^{K} \|\omega_{k}^{(m)}\|_F^2`;

- Conduct back propagation, and obtain the gradients :math:`\frac{\partial \text{Loss}}{\partial \boldsymbol{b}_j^{(m)}}` and :math:`\frac{\partial \text{Loss}}{\partial \omega_k^{(m)}}`;

- For :math:`j = 1` to :math:`p` do

  - Update estimates :math:`\gamma_j^{(m+1)} = \gamma_j^{(m)} - \alpha_1 \frac{\partial \text{Loss}}{\partial \gamma_j^{(m)}}`;

  - For :math:`k = 1` to :math:`q` do

    - Update estimates :math:`\beta_{kj}^{(m+1)} = \beta_{kj}^{(m)} - \alpha_1 \frac{\partial \text{Loss}}{\partial \beta_{kj}^{(m)}}`;

  - End for;

- End for;

- For :math:`k = 1` to :math:`K` do

  - Update :math:`\omega_k^{(m+1)} = \omega_k^{(m)} - \alpha_2 \frac{\partial \text{Loss}}{\partial \omega_k^{(m)}}`;

- End for;

- Update :math:`m = m + 1`;

Until convergence or :math:`m` reaches its maximum.