deeprvat.deeprvat.models

Module Contents

Classes

BaseModel

Base class containing functions that will be called by PyTorch Lightning in the background by default.

DeepSetAgg

class contains the gene impairment module used for burden computation.

DeepSet

Wrapper class for burden computation, that also does phenotype prediction. It inherits parameters from BaseModel, which is where Pytorch Lightning specific functions like “training_step” or “validation_epoch_end” can be found. Those functions are called in background by default.

LinearAgg

To capture only linear effect, this model can be used as it only uses a single linear layer without a non-linear activation function. It still contains the gene impairment module used for burden computation.

TwoLayer

Wrapper class to capture linear effects. Inherits parameters from BaseModel, which is where Pytorch Lightning specific functions like “training_step” or “validation_epoch_end” can be found. Those functions are called in background by default.

Functions

get_hparam

Data

logger

METRICS

API

deeprvat.deeprvat.models.logger = 'getLogger(...)'
deeprvat.deeprvat.models.METRICS = None
deeprvat.deeprvat.models.get_hparam(module: pytorch_lightning.LightningModule, param: str, default: Any)
class deeprvat.deeprvat.models.BaseModel(config: dict, n_annotations: Dict[str, int], n_covariates: Dict[str, int], n_genes: Dict[str, int], phenotypes: List[str], stage: str = 'train', **kwargs)

Bases: pytorch_lightning.LightningModule

Base class containing functions that will be called by PyTorch Lightning in the background by default.

Initialization

Initializes BaseModel.

Parameters:
  • config (dict) – Represents the content of config.yaml.

  • n_annotations (Dict[str, int]) – Contains the number of annotations used for each phenotype.

  • n_covariates (Dict[str, int]) – Contains the number of covariates used for each phenotype.

  • n_genes (Dict[str, int]) – Contains the number of genes used for each phenotype.

  • phenotypes (List[str]) – Contains the phenotypes used during training.

  • stage (str) – Contains a prefix indicating the dataset the model is operating on. Defaults to “train”. (optional)

  • kwargs – Additional keyword arguments.

configure_optimizers() torch.optim.Optimizer

Function used to setup an optimizer and scheduler by their parameters which are specified in config

training_step(batch: dict, batch_idx: int) torch.Tensor

Function called by trainer during training and returns the loss used to update weights and biases.

Parameters:
  • batch (dict) – A dictionary containing the batch data.

  • batch_idx (int) – The index of the current batch.

Returns:

torch.Tensor: The loss value computed to update weights and biases based on the predictions.

Raises:

RuntimeError – If NaNs are found in the training loss.

validation_step(batch: dict, batch_idx: int)

During validation, we do not compute backward passes, such that we can accumulate phenotype predictions and evaluate them afterward as a whole.

Parameters:
  • batch (dict) – A dictionary containing the validation batch data.

  • batch_idx (int) – The index of the current validation batch.

Returns:

dict: A dictionary containing phenotype predictions (“y_pred_by_pheno”) and corresponding ground truth values (“y_by_pheno”).

validation_epoch_end(prediction_y: List[Dict[str, Dict[str, torch.Tensor]]])

Evaluate accumulated phenotype predictions at the end of the validation epoch.

This function takes a list of dictionaries containing accumulated phenotype predictions and corresponding ground truth values obtained during the validation process. It computes various metrics based on these predictions and logs the results.

Parameters:

prediction_y (List[Dict[str, Dict[str, torch.Tensor]]]) – A list of dictionaries containing accumulated phenotype predictions and corresponding ground truth values obtained during the validation process.

Returns:

None

Return type:

None

test_step(batch: dict, batch_idx: int)

During testing, we do not compute backward passes, such that we can accumulate phenotype predictions and evaluate them afterward as a whole.

Parameters:
  • batch (dict) – A dictionary containing the testing batch data.

  • batch_idx (int) – The index of the current testing batch.

Returns:

dict: A dictionary containing phenotype predictions (“y_pred”) and corresponding ground truth values (“y”).

Return type:

dict

test_epoch_end(prediction_y: List[Dict[str, torch.Tensor]])

Evaluate accumulated phenotype predictions at the end of the testing epoch.

Parameters:

prediction_y (List[Dict[str, Dict[str, torch.Tensor]]]) – A list of dictionaries containing accumulated phenotype predictions and corresponding ground truth values obtained during the testing process.

configure_callbacks()
class deeprvat.deeprvat.models.DeepSetAgg(n_annotations: int, phi_layers: int, phi_hidden_dim: int, rho_layers: int, rho_hidden_dim: int, activation: str, pool: str, output_dim: int = 1, dropout: Optional[float] = None, use_sigmoid: bool = False, reverse: bool = False)

Bases: pytorch_lightning.LightningModule

class contains the gene impairment module used for burden computation.

Variants are fed through an embedding network Phi to compute a variant embedding. The variant embedding is processed by a permutation-invariant aggregation to yield a gene embedding. Afterward, the second network Rho estimates the final gene impairment score. All parameters of the gene impairment module are shared across genes and traits.

Initialization

Initializes the DeepSetAgg module.

Parameters:
  • n_annotations (int) – Number of annotations.

  • phi_layers (int) – Number of layers in Phi.

  • phi_hidden_dim (int) – Internal dimensionality of linear layers in Phi.

  • rho_layers (int) – Number of layers in Rho.

  • rho_hidden_dim (int) – Internal dimensionality of linear layers in Rho.

  • activation (str) – Activation function used; should match its name in torch.nn.

  • pool (str) – Invariant aggregation function used to aggregate gene variants. Possible values: ‘max’, ‘sum’.

  • output_dim (int) – Number of burden scores. Defaults to 1. (optional)

  • dropout (Optional[float]) – Probability by which some parameters are set to 0. (optional)

  • use_sigmoid (bool) – Whether to project burden scores to [0, 1]. Also used as a linear activation function during training. Defaults to False. (optional)

  • reverse (bool) – Whether to reverse the burden score (used during association testing). Defaults to False. (optional)

set_reverse(reverse: bool = True)

Reverse burden score during association testing if the model predicts in negative space.

Parameters:

reverse (bool) – Indicates whether the ‘reverse’ attribute should be set to True or False. Defaults to True.

Note: Compare associate.py, reverse_models() for further detail

forward(x)

Perform a forward pass through the model.

Parameters:

x (tensor) – Batched input data

Returns:

Burden scores

Return type:

tensor

class deeprvat.deeprvat.models.DeepSet(config: dict, n_annotations: Dict[str, int], n_covariates: Dict[str, int], n_genes: Dict[str, int], phenotypes: List[str], agg_model: Optional[torch.nn.Module] = None, use_sigmoid: bool = False, reverse: bool = False, **kwargs)

Bases: deeprvat.deeprvat.models.BaseModel

Wrapper class for burden computation, that also does phenotype prediction. It inherits parameters from BaseModel, which is where Pytorch Lightning specific functions like “training_step” or “validation_epoch_end” can be found. Those functions are called in background by default.

Initialization

Initialize the DeepSet model.

Parameters:
  • config (dict) – Containing the content of config.yaml.

  • n_annotations (Dict[str, int]) – Contains the number of annotations used for each phenotype.

  • n_covariates (Dict[str, int]) – Contains the number of covariates used for each phenotype.

  • n_genes (Dict[str, int]) – Contains the number of genes used for each phenotype.

  • phenotypes (List[str]) – Contains the phenotypes used during training.

  • agg_model (Optional[pl.LightningModule / nn.Module]) – Model used for burden computation. If not provided, it will be initialized. (optional)

  • use_sigmoid (bool) – Determines if burden scores should be projected to [0, 1]. Acts as a linear activation function to mimic association testing during training.

  • reverse (bool) – Determines if the burden score should be reversed (used during association testing).

  • kwargs – Additional keyword arguments.

forward(batch)

Forward pass through the model.

Parameters:

batch (dict) – Dictionary of phenotypes, each containing the following keys: - indices (tensor): Indices for the underlying dataframe. - covariates (tensor): Covariates of samples, e.g., age. Content: samples x covariates. - rare_variant_annotations (tensor): Annotated genomic variants. Content: samples x genes x annotations x variants. - y (tensor): Actual phenotypes (ground truth data).

Returns:

Dictionary containing predicted phenotypes

Return type:

dict

class deeprvat.deeprvat.models.LinearAgg(n_annotations: int, pool: str, output_dim: int = 1, reverse: bool = False)

Bases: pytorch_lightning.LightningModule

To capture only linear effect, this model can be used as it only uses a single linear layer without a non-linear activation function. It still contains the gene impairment module used for burden computation.

Initialization

Initialize the LinearAgg model.

Parameters:
  • n_annotations (int) – Number of annotations.

  • pool (str) – Pooling method (“sum” or “max”) to be used.

  • output_dim (int) – Dimensionality of the output. Defaults to 1. (optional)

set_reverse(reverse: bool = True)

Reverse burden score during association testing if the model predicts in negative space.

Parameters:

reverse (bool) – Indicates whether the ‘reverse’ attribute should be set to True or False. Defaults to True.

Note: Compare associate.py, reverse_models() for further detail

forward(x)

Perform a forward pass through the model.

Parameters:

x (tensor) – Batched input data

Returns:

Burden scores

Return type:

tensor

class deeprvat.deeprvat.models.TwoLayer(config: dict, n_annotations: int, n_covariates: int, n_genes: int, agg_model: Optional[torch.nn.Module] = None, **kwargs)

Bases: deeprvat.deeprvat.models.BaseModel

Wrapper class to capture linear effects. Inherits parameters from BaseModel, which is where Pytorch Lightning specific functions like “training_step” or “validation_epoch_end” can be found. Those functions are called in background by default.

Initialization

Initializes the TwoLayer model.

Parameters:
  • config (dict) – Represents the content of config.yaml.

  • n_annotations (int) – Number of annotations.

  • n_covariates (int) – Number of covariates.

  • n_genes (int) – Number of genes.

  • agg_model (Optional[nn.Module]) – Model used for burden computation. If not provided, it will be initialized. (optional)

  • kwargs – Additional keyword arguments.

forward(batch)

Forward pass through the model.

Parameters:

batch (dict) – Dictionary of phenotypes, each containing the following keys: - indices (tensor): Indices for the underlying dataframe. - covariates (tensor): Covariates of samples, e.g., age. Content: samples x covariates. - rare_variant_annotations (tensor): Annotated genomic variants. Content: samples x genes x annotations x variants. - y (tensor): Actual phenotypes (ground truth data).

Returns:

Dictionary containing predicted phenotypes

Return type:

dict