Usage principles

Importing MERCluster

Import MERCluster as:

import mercluster

Constructing an analysis task file

MERCluster uses snakemake to coordinate the automated execution of analyses. To help the user construct and execute snakemake workflows, MERCluster reads in a set of analysis tasks and parameters in JSON format and converts this to a .Snakefile.

MERCluster looks for an entry called analysis_tasks, and retrieves the list associated with it. Each entry in the list is an analysis task along with any parameters the user specifies for that task. (an example can be found in examples/analysis_tasks.json)

{
        "analysis_tasks": [
    ...
    ]
}

An entry in the list for the execution of the mercluster.analysis.cluster.Clustering task could appear as follows:

{
    "analysis_tasks": [
        ...
        {
            "task": "Clustering",
            "module": "mercluster.analysis.cluster",
            "parameters": {
                "file_creation_task": "BypassAnalyzedData",
                "k_values": [10,12,15,20],
                "resolutions": [1,1.5,2]
            }
        },
        ...
    ]
}

This would instruct MERCluster to perform Clustering using the parameters contained within the parameters entry of this task.

Note

All analysis tasks must have a unique name. If you wanted to run one analysis task with different parameters multiple times within the same workflow, you can assign an analysis_name to the task.

...
{
    "task": "Clustering",
    "module": "mercluster.analysis.cluster",
    "analysis_name": "Clustering_neurons"
    "parameters": {
        "file_creation_task": "BypassAnalyzedData",
        "k_values": [10,12,15,20],
        "resolutions": [1,1.5,2]
        "cell_type": "Neurons"
    }
},
{
    "task": "Clustering",
    "module": "mercluster.analysis.cluster",
    "analysis_name": "Clustering_glia"
    "parameters": {
        "file_creation_task": "BypassAnalyzedData",
        "k_values": [10,12,15,20],
        "resolutions": [1,1.5,2]
        "cell_type": "Glia"
    }
},
...

All tasks that the user wants to run should be included, and additional tasks can be added at a later time if one wants to add additional analyses to a metadataset. Snakemake automatically determines what needs to be run, and will not re-run tasks that have already been completed so there is no need to remove completed tasks from the analysis tasks file.

To create a .Snakefile for a metadataset named metadataset1 that is composed of dataset1 and dataset2, run the following command:

python -m mercluster 'metadataset1' --dataset-list dataset1 dataset2 --generate-only -a /path/to/analysistasks.json

Automated execution of analysis tasks

After constructing a .Snakefile for a metadataset and a set of analysis tasks, the workflow can be executed by running:

python -m mercluster 'metadataset1'

Note

The construction and execution of a workflow can be performed in one line:

python -m mercluster 'metadataset1' --dataset-list dataset1 dataset2 -a /path/to/analysistasks.json

Adding the --snakemake-parameters flag to these commands allows you to pass additional parameters to snakemake by providing a path to a json file containing them. The most typical parameter to pass would be related to executing jobs on a HPC. Examples of these for execution on an HPC running Slurm can be found in examples/snakemake_params.json and examples/clusterconfig.json

Execution of a selected task

A task can be executed outside of the snakemake workflow if desired, to do so just provide the -t flag and the name of the task.

python -m mercluster 'metadataset1' --dataset-list dataset1 dataset2 -t Clustering -a /path/to/analysistasks.json