scalar_ge¶

G-E interaction analysis via deep leanring when the input X is scalar.

Description¶

This function provides an approach based on neural network in conjunction with MCP and L ₂ penalizations which can simultaneously conduct model estimation and selection of important main G effects and G–E interactions, while uniquely respecting the “main effects, interactions” variable selection hierarchy.

See also at sim_data_scalar and grid_scalar_ge. The model is ScalarGE.

Usage¶

scalar_ge(y, G, E, ytype, num_hidden_layers, nodes_hidden_layer, num_epochs, learning_rate1, learning_rate2, lambda1 = None, lambda2 = None, Lambda = None, threshold = None, split_type = 0, ratio = [7, 3], important_feature = True, plot = True)

Parameters¶

This part shows the meanings and data types of parameters. Users can check the table below to build a customizable ScalarGE model.

Parameter	Description
y	array or dataframe, the response variable.
G	array or dataframe, the scalar genetic variable.
E	array or dataframe, the scalar environmental variable.
ytype	character, “Survival”, “Binary” or “Continuous” type of the output y.
num_hidden_layers	numeric, number of hidden layers in the neural network.
nodes_hidden_layer	list, contains number of nodes in each hidden layer.
num_epochs	numeric, number of epochs for neural network training.
learning_rate1	numeric, learning rate of sparse layers.
learning_rate2	numeric, learning rate of hidden layers.
lambda1	numeric or None, tuning parameter of the first MCP penalization.
lambda2	numeric, tuning parameter of the second MCP penalization.
Lambda	numeric, tuning parameter of L2 penalization.
threshold	numeric, threshold in the selection of important features.
split_type	integer, types of data split. If split_type = 0, the data is divided into a training set and a validation set. If split_type = 1, the data is divided into a training set, a validation set and a test set.
ratio	list, the ratio of data split.
important_feature	bool, “True” or “False”, whether or not to show output features.
plot	bool, “True” or “False”, whether or not to show the line plot of residuals with the number of neural network epochs.

Value¶

The function scalar_ge outputs a tuple including training results of the ScalarGE model:

Residual of the training set.
Residual of the validation set.
C index (y is survival) or R2 (y is continuous or binary) of the training set.
C index (y is survival) or R2 (y is continuous or binary) of the validation set.
A neural network after training.
Important features of gene variables.
Important features of G-E interaction variables.

Here is an example output for an established model:

In terms of visualization, this function can output the line plot of residuals with the number of neural network epochs. Here is an example output:

Examples¶

Here is a quick example for using this function:

from GENetLib.sim_data import sim_data_scalar
from GENetLib.scalar_ge import scalar_ge
ytype = 'Survival'
num_hidden_layers = 2
nodes_hidden_layer = [1000, 100]
learning_rate2 = 0.015
Lambda = 0.2
learning_rate1 = 0.09
lambda2 = 0.09
num_epochs = 100
scalar_survival_linear = sim_data_scalar(rho_G = 0.25, rho_E = 0.3, dim_G = 500, dim_E = 5, n = 1500, dim_E_Sparse = 2, ytype = ytype, n_inter = 30)
y = scalar_survival_linear['y']
G = scalar_survival_linear['G']
E = scalar_survival_linear['E']
scalar_ge_res = scalar_ge(y, G, E, ytype, num_hidden_layers, nodes_hidden_layer, num_epochs, learning_rate1, learning_rate2, lambda1 = None, lambda2 = lambda2, Lambda = Lambda)

Previous: sim_data_func | Next: func_ge

On this page

scalar_ge
- Description
- Usage
- Parameters
- Value
- Examples

GENetLib's documentation

This Page

Quick search