GENetLib's documentation

GENetLib is a Python library for gene–environment interaction analysis via deep learning.

scalar_ge

G-E interaction analysis via deep leanring when the input X is scalar.

Description

This function provides an approach based on neural network in conjunction with MCP and L 2 penalizations which can simultaneously conduct model estimation and selection of important main G effects and G–E interactions, while uniquely respecting the “main effects, interactions” variable selection hierarchy.

See also at sim_data_scalar and grid_scalar_ge. The model is ScalarGE.

Usage

scalar_ge(y, G, E, ytype, num_hidden_layers, nodes_hidden_layer, num_epochs, learning_rate1, learning_rate2, lambda1 = None, lambda2 = None, Lambda = None, threshold = None, split_type = 0, ratio = [7, 3], important_feature = True, plot = True)

Parameters

This part shows the meanings and data types of parameters. Users can check the table below to build a customizable ScalarGE model.

Parameter

Description

y

array or dataframe, the response variable.

G

array or dataframe, the scalar genetic variable.

E

array or dataframe, the scalar environmental variable.

ytype

character, “Survival”, “Binary” or “Continuous” type of the output y.

num_hidden_layers

numeric, number of hidden layers in the neural network.

nodes_hidden_layer

list, contains number of nodes in each hidden layer.

num_epochs

numeric, number of epochs for neural network training.

learning_rate1

numeric, learning rate of sparse layers.

learning_rate2

numeric, learning rate of hidden layers.

lambda1

numeric or None, tuning parameter of the first MCP penalization.

lambda2

numeric, tuning parameter of the second MCP penalization.

Lambda

numeric, tuning parameter of L2 penalization.

threshold

numeric, threshold in the selection of important features.

split_type

integer, types of data split. If split_type = 0, the data is divided into a training set and a validation set. If split_type = 1, the data is divided into a training set, a validation set and a test set.

ratio

list, the ratio of data split.

important_feature

bool, “True” or “False”, whether or not to show output features.

plot

bool, “True” or “False”, whether or not to show the line plot of residuals with the number of neural network epochs.

Value

The function scalar_ge outputs a tuple including training results of the ScalarGE model:

  • Residual of the training set.

  • Residual of the validation set.

  • C index (y is survival) or R2 (y is continuous or binary) of the training set.

  • C index (y is survival) or R2 (y is continuous or binary) of the validation set.

  • A neural network after training.

  • Important features of gene variables.

  • Important features of G-E interaction variables.

Here is an example output for an established model:

../_images/scalar_ge.png

In terms of visualization, this function can output the line plot of residuals with the number of neural network epochs. Here is an example output:

../_images/scalar_ge_train.png

Examples

Here is a quick example for using this function:

from GENetLib.sim_data import sim_data_scalar
from GENetLib.scalar_ge import scalar_ge
ytype = 'Survival'
num_hidden_layers = 2
nodes_hidden_layer = [1000, 100]
learning_rate2 = 0.015
Lambda = 0.2
learning_rate1 = 0.09
lambda2 = 0.09
num_epochs = 100
scalar_survival_linear = sim_data_scalar(rho_G = 0.25, rho_E = 0.3, dim_G = 500, dim_E = 5, n = 1500, dim_E_Sparse = 2, ytype = ytype, n_inter = 30)
y = scalar_survival_linear['y']
G = scalar_survival_linear['G']
E = scalar_survival_linear['E']
scalar_ge_res = scalar_ge(y, G, E, ytype, num_hidden_layers, nodes_hidden_layer, num_epochs, learning_rate1, learning_rate2, lambda1 = None, lambda2 = lambda2, Lambda = Lambda)

Previous: sim_data_func | Next: func_ge