deeprvat.data.rare

Module Contents

Classes

PaddedAnnotations

SparseGenotype

Data

logger

API

deeprvat.data.rare.logger = 'getLogger(...)'
class deeprvat.data.rare.PaddedAnnotations(base_dataset, annotations: List[str], thresholds: Dict[str, str] = None, gene_file: Optional[str] = None, genes_to_keep: Optional[Set[str]] = None, pad_value: Union[float, int, str] = 0.0, verbose: bool = False, low_memory: bool = False, skip_embedding: bool = False)

Initialization

embed(idx: int, variant_ids: numpy.ndarray, genotype: numpy.ndarray) List[List[torch.Tensor]]

Returns: List[List[torch.Tensor]]

One outer list element for each gene; inner list elements are annotations for variants, one element for each variant in a gene for this sample

collate_fn(batch: List[List[List[numpy.ndarray]]], device: torch.device = torch.device('cpu')) torch.Tensor

Returns: torch.Tensor

Dimensions of tensor: samples x genes x annotations x variants. Last dimension is padded to fit all variants.

setup_annotations(rare_variant_ids: pandas.Series, thresholds: Optional[Dict[str, str]], gene_file: Optional[str], genes_to_keep: Optional[Set[str]] = None)
apply_thresholds(thresholds: Optional[Dict[str, str]])
remap_group_ids()
setup_metadata()
get_metadata() Dict[str, numpy.ndarray]
class deeprvat.data.rare.SparseGenotype(base_dataset, annotations: List[str], thresholds: Dict[str, str] = None, gene_file: Optional[str] = None, genes_to_keep: Optional[Set[str]] = None, verbose: bool = False, low_memory: bool = False)

Initialization

embed(idx: int, variant_ids: numpy.ndarray, genotype: numpy.ndarray) scipy.sparse.coo_matrix

Returns: List[List[torch.Tensor]]

One outer list element for each gene; inner list elements are annotations for variants, one element for each variant in a gene for this sample

collate_fn(batch: List[scipy.sparse.coo_matrix]) scipy.sparse.coo_matrix
setup_annotations(rare_variant_ids: pandas.Series, thresholds: Optional[Dict[str, str]], gene_file: Optional[str], genes_to_keep: Optional[Set[str]] = None)
apply_thresholds(thresholds: Optional[Dict[str, str]])
remap_group_ids()
setup_metadata()
get_metadata() Dict[str, numpy.ndarray]