Training and association testing

For using the pretrained DeepRVAT model provided as part of the package, or a custom pretrained model, we have setup pipelines for running only the association testing stage. This includes creating the association dataset files, computing gene impairment scores, regression, and evaluation.

Configuration and input files

Configuration parameters must be specified in deeprvat_input_config.yaml. For details on the meanings of the parameters and the format of input files, see here.

You must specify

use_pretrained_models: True

in your configuration file.

The following parameters specify the locations of required input files:

pretrained_model_paths
gt_filename
variant_filename
phenotype_filename
annotation_filename
gene_filename
seed_gene_results

These parameters specify options for running DeepRVAT. Those marked (optional) have default values; see here for details.

phenotypes_for_association_testing
phenotypes_for_training
rare_variant_annotations
covariates
training
n_repeats
evaluation
y_transformation (optional)
association_testing_data_thresholds (optional)
training_data_thresholds (optional)
cv_options (required only when running cross validation)

Note that the file specified by annotation_filename must contain a column corresponding to each annotation in the list rare_variant_annotations in deeprvat_input_config.yaml.

Executing the pipeline

snakemake -j 1 --snakefile [path_to_deeprvat]/pipelines/association_testing_pretrained.snakefile

Replace [path_to_deeprvat] with the path to your copy of the DeepRVAT repository.

Using cross validation

Coming soon

Running the association testing pipeline with REGENIE

Coming soon