Usage principles¶
Constructing an analysis task file¶
MERCluster uses snakemake to coordinate the automated execution of analyses. To help the user construct and execute snakemake workflows, MERCluster reads in a set of analysis tasks and parameters in JSON format and converts this to a .Snakefile.
MERCluster looks for an entry called analysis_tasks, and retrieves the list
associated with it. Each entry in the list is an analysis task along with any
parameters the user specifies for that task. (an example can be found in
examples/analysis_tasks.json)
{
"analysis_tasks": [
...
]
}
An entry in the list for the execution of the mercluster.analysis.cluster.Clustering task could appear as follows:
{
"analysis_tasks": [
...
{
"task": "Clustering",
"module": "mercluster.analysis.cluster",
"parameters": {
"file_creation_task": "BypassAnalyzedData",
"k_values": [10,12,15,20],
"resolutions": [1,1.5,2]
}
},
...
]
}
This would instruct MERCluster to perform Clustering using the parameters contained
within the parameters entry of this task.
Note
All analysis tasks must have a unique name. If you wanted to run one
analysis task with different parameters multiple times within the same
workflow, you can assign an analysis_name to the task.
...
{
"task": "Clustering",
"module": "mercluster.analysis.cluster",
"analysis_name": "Clustering_neurons"
"parameters": {
"file_creation_task": "BypassAnalyzedData",
"k_values": [10,12,15,20],
"resolutions": [1,1.5,2]
"cell_type": "Neurons"
}
},
{
"task": "Clustering",
"module": "mercluster.analysis.cluster",
"analysis_name": "Clustering_glia"
"parameters": {
"file_creation_task": "BypassAnalyzedData",
"k_values": [10,12,15,20],
"resolutions": [1,1.5,2]
"cell_type": "Glia"
}
},
...
All tasks that the user wants to run should be included, and additional tasks can be added at a later time if one wants to add additional analyses to a metadataset. Snakemake automatically determines what needs to be run, and will not re-run tasks that have already been completed so there is no need to remove completed tasks from the analysis tasks file.
To create a .Snakefile for a metadataset named metadataset1 that is composed of dataset1 and dataset2, run the following command:
python -m mercluster 'metadataset1' --dataset-list dataset1 dataset2 --generate-only -a /path/to/analysistasks.json
Automated execution of analysis tasks¶
After constructing a .Snakefile for a metadataset and a set of analysis tasks, the workflow can be executed by running:
python -m mercluster 'metadataset1'
Note
The construction and execution of a workflow can be performed in one line:
python -m mercluster 'metadataset1' --dataset-list dataset1 dataset2 -a /path/to/analysistasks.json
Adding the --snakemake-parameters flag to these commands allows you to pass
additional parameters to snakemake by providing a path to a json file containing
them. The most typical parameter to pass would be related to executing jobs
on a HPC. Examples of these for execution on an HPC running Slurm can be found in
examples/snakemake_params.json and examples/clusterconfig.json
Execution of a selected task¶
A task can be executed outside of the snakemake workflow if desired, to do so
just provide the -t flag and the name of the task.
python -m mercluster 'metadataset1' --dataset-list dataset1 dataset2 -t Clustering -a /path/to/analysistasks.json