scalar_ge¶
G-E interaction analysis via deep leanring when the input X is scalar.
Description¶
This function provides an approach based on neural network in conjunction with MCP and L 2 penalizations which can simultaneously conduct model estimation and selection of important main G effects and G–E interactions, while uniquely respecting the “main effects, interactions” variable selection hierarchy.
See also at sim_data_scalar and grid_scalar_ge. The model is ScalarGE.
Usage¶
scalar_ge(y, G, E, ytype, num_hidden_layers, nodes_hidden_layer, num_epochs, learning_rate1, learning_rate2, lambda1 = None, lambda2 = None, Lambda = None, threshold = None, split_type = 0, ratio = [7, 3], important_feature = True, plot = True)
Parameters¶
This part shows the meanings and data types of parameters. Users can check the table below to build a customizable ScalarGE model.
Parameter |
Description |
|---|---|
y |
array or dataframe, the response variable. |
G |
array or dataframe, the scalar genetic variable. |
E |
array or dataframe, the scalar environmental variable. |
ytype |
character, “Survival”, “Binary” or “Continuous” type of the output y. |
num_hidden_layers |
numeric, number of hidden layers in the neural network. |
nodes_hidden_layer |
list, contains number of nodes in each hidden layer. |
num_epochs |
numeric, number of epochs for neural network training. |
learning_rate1 |
numeric, learning rate of sparse layers. |
learning_rate2 |
numeric, learning rate of hidden layers. |
lambda1 |
numeric or None, tuning parameter of the first MCP penalization. |
lambda2 |
numeric, tuning parameter of the second MCP penalization. |
Lambda |
numeric, tuning parameter of L2 penalization. |
threshold |
numeric, threshold in the selection of important features. |
split_type |
integer, types of data split. If split_type = 0, the data is divided into a training set and a validation set. If split_type = 1, the data is divided into a training set, a validation set and a test set. |
ratio |
list, the ratio of data split. |
important_feature |
bool, “True” or “False”, whether or not to show output features. |
plot |
bool, “True” or “False”, whether or not to show the line plot of residuals with the number of neural network epochs. |
Value¶
The function scalar_ge outputs a tuple including training results of the ScalarGE model:
Residual of the training set.
Residual of the validation set.
C index (y is survival) or R2 (y is continuous or binary) of the training set.
C index (y is survival) or R2 (y is continuous or binary) of the validation set.
A neural network after training.
Important features of gene variables.
Important features of G-E interaction variables.
Here is an example output for an established model:
In terms of visualization, this function can output the line plot of residuals with the number of neural network epochs. Here is an example output:
Examples¶
Here is a quick example for using this function:
from GENetLib.sim_data import sim_data_scalar
from GENetLib.scalar_ge import scalar_ge
ytype = 'Survival'
num_hidden_layers = 2
nodes_hidden_layer = [1000, 100]
learning_rate2 = 0.015
Lambda = 0.2
learning_rate1 = 0.09
lambda2 = 0.09
num_epochs = 100
scalar_survival_linear = sim_data_scalar(rho_G = 0.25, rho_E = 0.3, dim_G = 500, dim_E = 5, n = 1500, dim_E_Sparse = 2, ytype = ytype, n_inter = 30)
y = scalar_survival_linear['y']
G = scalar_survival_linear['G']
E = scalar_survival_linear['E']
scalar_ge_res = scalar_ge(y, G, E, ytype, num_hidden_layers, nodes_hidden_layer, num_epochs, learning_rate1, learning_rate2, lambda1 = None, lambda2 = lambda2, Lambda = Lambda)
Previous: sim_data_func | Next: func_ge