linkapy.parsing
===============

.. py:module:: linkapy.parsing


Classes
-------

.. autoapisummary::

   linkapy.parsing.Linkapy_Parser


Functions
---------

.. autoapisummary::

   linkapy.parsing.parse_rna
   linkapy.parsing.read_rna_to_anndata
   linkapy.parsing.read_meth_to_anndata
   linkapy.parsing.match_cells
   linkapy.parsing.get_common_cellname


Module Contents
---------------

.. py:class:: Linkapy_Parser(methylation_path=None, transcriptome_path=None, output='linkapy_output', methylation_pattern=('*GC*tsv.gz', ), methylation_pattern_names=(), transcriptome_pattern=('*tsv', ), transcriptome_pattern_names=(), NOMe=False, threads=1, chromsizes=None, regions=None, blacklist=None, binsize=10000, project='linkapy', verbose=False)

   Linkapy_Parser mainly functions to create matrices (arrow format for RNA, mtx format for accessibility / methylation)
   from directories containing processed multi-modal single-cell data.

   At least one of both items should be provided:
    - methylation_path and/or transcriptome_path
    - regions or chromsizes file (if methylation_path is provided).

   :param str methylation_path: The path to the methylation directory (will be searched recursively!).
   :param str transcriptome_path: The path to the RNA output directory (will be searched recursively!).
   :param str output: The output directory where matrices will be written to. Defaults to current working directory in folder ('linkapy_output').
   :param tuple methylation_pattern: The glob pattern to search methylation path recursively. Defaults to ('GC'). Note that this is a tuple.
   :param tuple transcriptome_pattern: The glob pattern to search transcriptome path recursively. Defaults to ('tsv'). Note that this is a tuple.
   :param bool NOMe: If set, methylation_path will be searched for NOMe-seq data. The methylation path will be searched for patterns ('GCHN', 'WCGN').
   :param int threads: Number of threads to use for parsing. Defaults to 1.
   :param str chromsizes: Path to the chromsizes file for the genome. If set, methylation signal will be aggregated over bins
   :param tuple regions: Path or paths to bed files containing regions to aggregate methylation signal over. Can be gzipped. Note that this is a tuple.
   :param tuple blacklist: Path or paths to bed files containing regions to exclude from the aggregation. Can be gzipped. Note that this is a tuple.
   :param int binsize: Size of the bins to aggregate over. Only relevant if no regions are provided. Defaults to 10000.
   :param str project: Name of the project. Will be treated as a prefix for the output files. Defaults to 'linkapy'.


   .. py:attribute:: output


   .. py:attribute:: project
      :value: 'linkapy'


   .. py:attribute:: logfile


   .. py:attribute:: logger


   .. py:attribute:: methylation_path


   .. py:attribute:: transcriptome_path


   .. py:attribute:: chromsizes


   .. py:attribute:: regions


   .. py:attribute:: blacklist


   .. py:attribute:: threads
      :value: 1


   .. py:attribute:: methylation_pattern
      :value: ('*GC*tsv.gz',)


   .. py:attribute:: methylation_pattern_names
      :value: ()


   .. py:attribute:: transcriptome_pattern
      :value: ('*tsv',)


   .. py:attribute:: transcriptome_pattern_names
      :value: ()


   .. py:attribute:: binsize
      :value: 10000


   .. py:method:: _validate()

      Validate the provided paths and parameters.


   .. py:method:: _glob()

      Discover files to aggregate over based on the paths and patterns provided.


   .. py:method:: parse()

      Parse the globbed files and create the different matrices and their corresponding metadata.


   .. py:method:: dump_mudata()


.. py:function:: parse_rna(files, prefix) -> None

   Read one or more featureCount files, combine them and write them to a counts and metadata arrow file.


.. py:function:: read_rna_to_anndata(prefix) -> anndata.AnnData

   From a prefix, read the count matrix, and the metadata, combine them into an AnnData object.


.. py:function:: read_meth_to_anndata(prefix) -> anndata.AnnData

   From a prefix, read the fraction matrices, and their metadata, and combine them into an AnnData object.


.. py:function:: match_cells(_l: List[List[str]], patterns: List[str], logger) -> tuple[List[List[str]], pandas.DataFrame] | tuple[None, None]

   Take a list of lists containing putative cell names. Per list, we need a 'best match'.
   This is needed since often an assay or context specific pre- or postfix is used, and we want to match them for the mudata object.


.. py:function:: get_common_cellname(cellnames: List[str]) -> str | float