`deeprvat.data.rare`

Module Contents

Classes

`PaddedAnnotations`
`SparseGenotype`

Data

logger

API

deeprvat.data.rare.logger = 'getLogger(...)'

class deeprvat.data.rare.PaddedAnnotations(base_dataset, annotations: List[str], thresholds: Dict[str, str] = None, gene_file: Optional[str] = None, genes_to_keep: Optional[Set[str]] = None, pad_value: Union[float, int, str] = 0.0, verbose: bool = False, low_memory: bool = False, skip_embedding: bool = False)

Initialization

embed(idx: int, variant_ids: numpy.ndarray, genotype: numpy.ndarray) → List[List[torch.Tensor]]

Returns: List[List[torch.Tensor]]

One outer list element for each gene; inner list elements are annotations for variants, one element for each variant in a gene for this sample

collate_fn(batch: List[List[List[numpy.ndarray]]], device: torch.device = torch.device('cpu')) → torch.Tensor

Returns: torch.Tensor

Dimensions of tensor: samples x genes x annotations x variants. Last dimension is padded to fit all variants.

setup_annotations(rare_variant_ids: pandas.Series, thresholds: Optional[Dict[str, str]], gene_file: Optional[str], genes_to_keep: Optional[Set[str]] = None)

apply_thresholds(thresholds: Optional[Dict[str, str]])

remap_group_ids()

setup_metadata()

get_metadata() → Dict[str, numpy.ndarray]

class deeprvat.data.rare.SparseGenotype(base_dataset, annotations: List[str], thresholds: Dict[str, str] = None, gene_file: Optional[str] = None, genes_to_keep: Optional[Set[str]] = None, verbose: bool = False, low_memory: bool = False)