linkapy.parsing
Classes
Linkapy_Parser mainly functions to create matrices (arrow format for RNA, mtx format for accessibility / methylation) |
Functions
|
Read one or more featureCount files, combine them and write them to a counts and metadata arrow file. |
|
From a prefix, read the count matrix, and the metadata, combine them into an AnnData object. |
|
From a prefix, read the fraction matrices, and their metadata, and combine them into an AnnData object. |
|
Take a list of lists containing putative cell names. Per list, we need a 'best match'. |
|
Module Contents
- class linkapy.parsing.Linkapy_Parser(methylation_path=None, transcriptome_path=None, output='linkapy_output', methylation_pattern=('*GC*tsv.gz',), methylation_pattern_names=(), transcriptome_pattern=('*tsv',), transcriptome_pattern_names=(), NOMe=False, threads=1, chromsizes=None, regions=None, blacklist=None, binsize=10000, project='linkapy', verbose=False)
Linkapy_Parser mainly functions to create matrices (arrow format for RNA, mtx format for accessibility / methylation) from directories containing processed multi-modal single-cell data.
- At least one of both items should be provided:
methylation_path and/or transcriptome_path
regions or chromsizes file (if methylation_path is provided).
- Parameters:
methylation_path (str) – The path to the methylation directory (will be searched recursively!).
transcriptome_path (str) – The path to the RNA output directory (will be searched recursively!).
output (str) – The output directory where matrices will be written to. Defaults to current working directory in folder (‘linkapy_output’).
methylation_pattern (tuple) – The glob pattern to search methylation path recursively. Defaults to (‘GC’). Note that this is a tuple.
transcriptome_pattern (tuple) – The glob pattern to search transcriptome path recursively. Defaults to (‘tsv’). Note that this is a tuple.
NOMe (bool) – If set, methylation_path will be searched for NOMe-seq data. The methylation path will be searched for patterns (‘GCHN’, ‘WCGN’).
threads (int) – Number of threads to use for parsing. Defaults to 1.
chromsizes (str) – Path to the chromsizes file for the genome. If set, methylation signal will be aggregated over bins
regions (tuple) – Path or paths to bed files containing regions to aggregate methylation signal over. Can be gzipped. Note that this is a tuple.
blacklist (tuple) – Path or paths to bed files containing regions to exclude from the aggregation. Can be gzipped. Note that this is a tuple.
binsize (int) – Size of the bins to aggregate over. Only relevant if no regions are provided. Defaults to 10000.
project (str) – Name of the project. Will be treated as a prefix for the output files. Defaults to ‘linkapy’.
- output
- project = 'linkapy'
- logfile
- logger
- methylation_path
- transcriptome_path
- chromsizes
- regions
- blacklist
- threads = 1
- methylation_pattern = ('*GC*tsv.gz',)
- methylation_pattern_names = ()
- transcriptome_pattern = ('*tsv',)
- transcriptome_pattern_names = ()
- binsize = 10000
- _validate()
Validate the provided paths and parameters.
- _glob()
Discover files to aggregate over based on the paths and patterns provided.
- parse()
Parse the globbed files and create the different matrices and their corresponding metadata.
- dump_mudata()
- linkapy.parsing.parse_rna(files, prefix) None
Read one or more featureCount files, combine them and write them to a counts and metadata arrow file.
- linkapy.parsing.read_rna_to_anndata(prefix) anndata.AnnData
From a prefix, read the count matrix, and the metadata, combine them into an AnnData object.
- linkapy.parsing.read_meth_to_anndata(prefix) anndata.AnnData
From a prefix, read the fraction matrices, and their metadata, and combine them into an AnnData object.
- linkapy.parsing.match_cells(_l: List[List[str]], patterns: List[str], logger) tuple[List[List[str]], pandas.DataFrame] | tuple[None, None]
Take a list of lists containing putative cell names. Per list, we need a ‘best match’. This is needed since often an assay or context specific pre- or postfix is used, and we want to match them for the mudata object.
- linkapy.parsing.get_common_cellname(cellnames: List[str]) str | float