GENetLib's documentation

GENetLib is a Python library for gene–environment interaction analysis via deep learning.

grid_scalar_ge

Grid search for scalar_ge.

Description

This function performs grid search for scalar_ge over a grid of values for the regularization parameter L, L2` and learning rate Learning_Rate1, Learning_Rate2.

See also at sim_data_scalar and scalar_ge. The model is ScalarGE.

Usage

grid_scalar_ge(y, G, E, ytype, num_hidden_layers, nodes_hidden_layer, num_epochs, learning_rate1, learning_rate2, lambda1 = None, lambda2 = None, Lambda = None, threshold = None, split_type = 0, ratio = [7, 3], important_feature = True, plot = True)

Parameters

This part shows the meanings and data types of parameters. Users can check the table below to build a optimal ScalarGE model with given parameters.

Parameter

Description

y

array or dataframe, the response variable.

G

array or dataframe, the scalar genetic variable.

E

array or dataframe, the scalar environmental variable.

ytype

character, “Survival”, “Binary” or “Continuous” type of the output y.

num_hidden_layers

numeric, number of hidden layers in the neural network.

nodes_hidden_layer

list, contains number of nodes in each hidden layer.

num_epochs

numeric, number of epochs for neural network training.

learning_rate1

list, learning rates of sparse layers.

learning_rate2

list, learning rates of hidden layers.

lambda1

numeric or None, tuning parameter of the first MCP penalization.

lambda2

list, tuning parameters of the second MCP penalization.

Lambda

list, tuning parameters of L2 penalization.

threshold

numeric, threshold in the selection of important features.

split_type

integer, types of data split. If split_type = 0, the data is divided into a training set and a validation set. If split_type = 1, the data is divided into a training set, a validation set and a test set.

ratio

list, the ratio of data split.

important_feature

bool, “True” or “False”, whether or not to show output features.

plot

bool, “True” or “False”, whether or not to show the line plot of residuals with the number of neural network epochs.

Value

The function grid_scalar_ge outputs a tuple including training results and optimal parameters of the ScalarGE model.

  • Values of tunning parameters after grid search.

  • Residual of the training set.

  • Residual of the validation set.

  • C index (y is survival) or R2 (y is continuous or binary) of the training set.

  • C index (y is survival) or R2 (y is continuous or binary) of the validation set.

  • A neural network after training.

  • Important features of gene variables.

  • Important features of GE interaction variables.

Here is an example output for an established model:

../_images/grid_scalar_ge.png

In terms of visualization, this function can output the line plot of residuals with the number of neural network epochs. Here is an example output:

../_images/grid_scalar_ge_train.png

Examples

Here is a quick example for using this function:

from GENetLib.sim_data import sim_data_scalar
from GENetLib.grid_scalar_ge import grid_scalar_ge
ytype = 'Survival'
num_hidden_layers = 2
nodes_hidden_layer = [1000, 100]
learning_rate2 = [0.035, 0.045]
Lambda = [0.1]
learning_rate1 = [0.01, 0.02, 0.03, 0.04, 0.05]
lambda2 = [0.04, 0.06, 0.07, 0.09]
num_epochs = 100
scalar_survival_linear = sim_data_scalar(rho_G = 0.25, rho_E = 0.3, dim_G = 500, dim_E = 5, n = 1500, dim_E_Sparse = 2, ytype = 'Survival', n_inter = 30)
y = scalar_survival_linear['y']
G = scalar_survival_linear['G']
E = scalar_survival_linear['E']
grid_scalar_ge_res = grid_scalar_ge(y, G, E, ytype, num_hidden_layers, nodes_hidden_layer, num_epochs, learning_rate1, learning_rate2, lambda1 = None, lambda2 = lambda2, Lambda = Lambda, threshold = 0.05)

Previous: func_ge | Next: grid_func_ge