Usage principles ---------------- Importing MERCluster ~~~~~~~~~~~~~~~~~~~~ Import MERCluster as: .. code-block:: none import mercluster Constructing an analysis task file ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ MERCluster uses snakemake_ to coordinate the automated execution of analyses. To help the user construct and execute snakemake workflows, MERCluster reads in a set of analysis tasks and parameters in JSON format and converts this to a .Snakefile. .. _snakemake: https://snakemake.readthedocs.io/en/stable/ MERCluster looks for an entry called ``analysis_tasks``, and retrieves the list associated with it. Each entry in the list is an analysis task along with any parameters the user specifies for that task. (an example can be found in examples/analysis_tasks.json) .. code-block:: none { "analysis_tasks": [ ... ] } An entry in the list for the execution of the mercluster.analysis.cluster.Clustering task could appear as follows: .. code-block:: none { "analysis_tasks": [ ... { "task": "Clustering", "module": "mercluster.analysis.cluster", "parameters": { "file_creation_task": "BypassAnalyzedData", "k_values": [10,12,15,20], "resolutions": [1,1.5,2] } }, ... ] } This would instruct MERCluster to perform Clustering using the parameters contained within the ``parameters`` entry of this task. .. note:: All analysis tasks must have a unique name. If you wanted to run one analysis task with different parameters multiple times within the same workflow, you can assign an ``analysis_name`` to the task. .. code-block:: none ... { "task": "Clustering", "module": "mercluster.analysis.cluster", "analysis_name": "Clustering_neurons" "parameters": { "file_creation_task": "BypassAnalyzedData", "k_values": [10,12,15,20], "resolutions": [1,1.5,2] "cell_type": "Neurons" } }, { "task": "Clustering", "module": "mercluster.analysis.cluster", "analysis_name": "Clustering_glia" "parameters": { "file_creation_task": "BypassAnalyzedData", "k_values": [10,12,15,20], "resolutions": [1,1.5,2] "cell_type": "Glia" } }, ... All tasks that the user wants to run should be included, and additional tasks can be added at a later time if one wants to add additional analyses to a metadataset. Snakemake automatically determines what needs to be run, and will not re-run tasks that have already been completed so there is no need to remove completed tasks from the analysis tasks file. To create a .Snakefile for a metadataset named metadataset1 that is composed of dataset1 and dataset2, run the following command: .. code-block:: none python -m mercluster 'metadataset1' --dataset-list dataset1 dataset2 --generate-only -a /path/to/analysistasks.json Automated execution of analysis tasks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ After constructing a .Snakefile for a metadataset and a set of analysis tasks, the workflow can be executed by running: .. code-block:: none python -m mercluster 'metadataset1' .. note:: The construction and execution of a workflow can be performed in one line: .. code-block:: none python -m mercluster 'metadataset1' --dataset-list dataset1 dataset2 -a /path/to/analysistasks.json Adding the ``--snakemake-parameters`` flag to these commands allows you to pass additional parameters to snakemake by providing a path to a json file containing them. The most typical parameter to pass would be related to executing jobs on a HPC. Examples of these for execution on an HPC running Slurm can be found in examples/snakemake_params.json and examples/clusterconfig.json Execution of a selected task ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A task can be executed outside of the snakemake workflow if desired, to do so just provide the ``-t`` flag and the name of the task. .. code-block:: none python -m mercluster 'metadataset1' --dataset-list dataset1 dataset2 -t Clustering -a /path/to/analysistasks.json