Analysis tasks
==============

.. _compileoutput-label:

compileoutput
-------------

BypassAnalyzedData
^^^^^^^^^^^^^^^^^^

| Description:
| Creates a copy of a user-selected .csv file. This enables easy integration
 with any of the downstream analysis tasks

| Parameters:
| * source\_file -- Path to file that should be copied
| * overwrite -- Boolean to indicate whether to overwrite any files present in this analysis task

AggregateMERlinData
^^^^^^^^^^^^^^^^^^^
| Description:
| Loads the output of a selected analysis task from MERlin for each dataset included
 in the metaDataSet and combines them into a single, vertically concatenated file.

| Parameters:
| * task\_to\_aggregate -- Name of the MERlin task for which the analysis result should be concatentated
| * overwrite -- Boolean to indicate whether to overwrite any files present in this analysis task

cluster
-------------------

Clustering
^^^^^^^^^^
| Description:
| performs graph-based clustering of cells based on the provided data, most
 typically expression measurements. This method assumes that you want to use
 every entry in your input dataset for clustering

| Parameters:
| * file\_creation\_task -- Name of the :ref:`compileoutput` task that contains the expression data
| * prior\_clustering -- Name of the :ref:`Clustering` task, if any, that preceded this round of clustering
 (this is typically used when reclustering a subset of cells that were partitioned out in a prior clustering round.
| * cell\_type -- The name of the type of cells that are to be clustered (used in conjunction with prior\_clustering, see examples/cellTypeAnnotations.csv).
| * k\_values -- A list of the k values to use when constructing the k-nearest neighbor graph
| * resolutions -- A list of the resolutions to use for clustering
| * use\_PCs -- Boolean indicating whether to reduce dimensionality with PCA. Only PCs explaining more variance than the 1st PC of a randomized version of the dataset are kept
| * cluster\_min\_size -- Minimum number of cells that must be in a cluster
| * clustering\_algorithm -- Modularity optimization algorithm to use, either leiden or louvain

BootstrapClustering
^^^^^^^^^^^^^^^^^^^
| Description:
| Performs clustering on randomly downsampled (in terms of rows) instance of the input data matrix.

| Parameters:
| * bootstrap\_frac -- Fraction of rows to retain
| * bootstraps -- Number of different downsamplings to analyze for a given k value and resolution pairing
| * cluster\_task -- Name of the :ref:`Clustering` task that this bootstrap analysis is associated with (this
 ensures that the same set of clustering parameters are used between that clustering and the bootstrap clusterings).

ClusterStabilityAnalysis
^^^^^^^^^^^^^^^^^^^^^^^^
| Description:
| Calculates the stability of clusters based on jaccard similarity of the most
 pair of clusters from a full clustering result and a bootstrap clustering result when
 analyzed with the same k value and resolution, for all k value and resolution pairs analyzed.
 The clustering parameters that yielded the clustering result that contains at least 90% of
 the cells (or whatever is set as min\_fraction\_cells) in stable clusters and also yielded
 the greatest number of clusters is selected, and the full stability metrics are output to a table.
 If a cellTypeAnnotations.csv file is placed in the base directory of this analysis task it can be used
 to retrieve a subset of cells based on the labels on that annotation and the selected clustering result.

| Parameters:
| * min\_fraction\_cells: Minimum number of cells that must be in stable clusters for a result to be considered
| * cluster\_task: Name of the :ref:`Clustering` task to get results from
| * bootstrap_cluster_task: name of the :ref:`BootstrapClustering` task to get results from
| * kValues\_to\_consider: list of k values to use in stability analysis if restricting to a subset of those analysed in the clustering
| * resolutions\_to\_consider: list of resolutions to use in stability analysis if restricting to a subset of those analysed in the clustering