linkapy.parsing =============== .. py:module:: linkapy.parsing Classes ------- .. autoapisummary:: linkapy.parsing.Linkapy_Parser Functions --------- .. autoapisummary:: linkapy.parsing.parse_rna linkapy.parsing.read_rna_to_anndata linkapy.parsing.read_meth_to_anndata linkapy.parsing.match_cells linkapy.parsing.get_common_cellname Module Contents --------------- .. py:class:: Linkapy_Parser(methylation_path=None, transcriptome_path=None, output='linkapy_output', methylation_pattern=('*GC*tsv.gz', ), methylation_pattern_names=(), transcriptome_pattern=('*tsv', ), transcriptome_pattern_names=(), NOMe=False, threads=1, chromsizes=None, regions=None, blacklist=None, binsize=10000, project='linkapy', verbose=False) Linkapy_Parser mainly functions to create matrices (arrow format for RNA, mtx format for accessibility / methylation) from directories containing processed multi-modal single-cell data. At least one of both items should be provided: - methylation_path and/or transcriptome_path - regions or chromsizes file (if methylation_path is provided). :param str methylation_path: The path to the methylation directory (will be searched recursively!). :param str transcriptome_path: The path to the RNA output directory (will be searched recursively!). :param str output: The output directory where matrices will be written to. Defaults to current working directory in folder ('linkapy_output'). :param tuple methylation_pattern: The glob pattern to search methylation path recursively. Defaults to ('GC'). Note that this is a tuple. :param tuple transcriptome_pattern: The glob pattern to search transcriptome path recursively. Defaults to ('tsv'). Note that this is a tuple. :param bool NOMe: If set, methylation_path will be searched for NOMe-seq data. The methylation path will be searched for patterns ('GCHN', 'WCGN'). :param int threads: Number of threads to use for parsing. Defaults to 1. :param str chromsizes: Path to the chromsizes file for the genome. If set, methylation signal will be aggregated over bins :param tuple regions: Path or paths to bed files containing regions to aggregate methylation signal over. Can be gzipped. Note that this is a tuple. :param tuple blacklist: Path or paths to bed files containing regions to exclude from the aggregation. Can be gzipped. Note that this is a tuple. :param int binsize: Size of the bins to aggregate over. Only relevant if no regions are provided. Defaults to 10000. :param str project: Name of the project. Will be treated as a prefix for the output files. Defaults to 'linkapy'. .. py:attribute:: output .. py:attribute:: project :value: 'linkapy' .. py:attribute:: logfile .. py:attribute:: logger .. py:attribute:: methylation_path .. py:attribute:: transcriptome_path .. py:attribute:: chromsizes .. py:attribute:: regions .. py:attribute:: blacklist .. py:attribute:: threads :value: 1 .. py:attribute:: methylation_pattern :value: ('*GC*tsv.gz',) .. py:attribute:: methylation_pattern_names :value: () .. py:attribute:: transcriptome_pattern :value: ('*tsv',) .. py:attribute:: transcriptome_pattern_names :value: () .. py:attribute:: binsize :value: 10000 .. py:method:: _validate() Validate the provided paths and parameters. .. py:method:: _glob() Discover files to aggregate over based on the paths and patterns provided. .. py:method:: parse() Parse the globbed files and create the different matrices and their corresponding metadata. .. py:method:: dump_mudata() .. py:function:: parse_rna(files, prefix) -> None Read one or more featureCount files, combine them and write them to a counts and metadata arrow file. .. py:function:: read_rna_to_anndata(prefix) -> anndata.AnnData From a prefix, read the count matrix, and the metadata, combine them into an AnnData object. .. py:function:: read_meth_to_anndata(prefix) -> anndata.AnnData From a prefix, read the fraction matrices, and their metadata, and combine them into an AnnData object. .. py:function:: match_cells(_l: List[List[str]], patterns: List[str], logger) -> tuple[List[List[str]], pandas.DataFrame] | tuple[None, None] Take a list of lists containing putative cell names. Per list, we need a 'best match'. This is needed since often an assay or context specific pre- or postfix is used, and we want to match them for the mudata object. .. py:function:: get_common_cellname(cellnames: List[str]) -> str | float