Writing a Job Workflow

To execute a Python-based workflow using the cluster.py library, you need to load the relevant Conda environment, depending on your Python version:

For Python 2 pipelines:

module load rpbs_services/py2-rpbs

For Python 3 pipelines:

module load rpbs_services/py3-rpbs

Using the Python Cluster Library

The runTasks function

runTasks(command, args, tasks, tasks_from, environment_module, log_prefix, map_list, job_name, job_opts, joinFiles, progress, partition, account, qos, wait)

Parameter Description Default
command Path to the executable.
args List of string arguments to pass to the command. The symbol "map_item" is substituted with the current element from the list provided by map_list.
tasks Number of tasks to perform (job array). Each iteration increments the SLURM_ARRAY_TASK_ID environment variable by one. 1
tasks_from Initial value for the SLURM_ARRAY_TASK_ID environment variable. 1
environment_module Name of the environment/version to load. Should be specified in the same way as with module load.
log_prefix Prefix for the .log and .err files returned by Slurm. slurm
map_list Launches as many tasks as there are elements in the list. Each element is substituted for the "map_item" symbol provided in args.
job_name Name assigned to the job. environment_module:command
job_opts Additional configuration options for Slurm.
joinFiles Merge all log files into a single one. If set to False, one log file will be created per task. True
progress Displays progress (Mobyle). True
partition Slurm partition where the job will run. Partition of the main job
account Slurm account where the job will run. Account of the main job
qos QoS (Quality of Service) specification for the job.

Example of a Simple Python Script Using the Cluster Library

#!/usr/bin/env python

import cluster.cluster as cluster

cmd = "PyPPP3Exec"

args = ["-s %s" % self.options.seqFile,
        "-l %s" % self.options.label,
        "-v"]

cluster.runTasks(cmd, args, environment_module="pyppp3-light/1.0-rpbs", log_prefix="ppp")

Example of a Script Iterating Over a List

#!/usr/bin/env python

import cluster.cluster as cluster

cmd = "SiteAlignReverse_service"

args = ["map_item",
         "%s.pdbqt" % os.path.splitext(ps_input_file)[0],
         "%s" % p_args.scoring_method,
         "%s" % DFLT_BS_BANK]

pdbid_list = ['1kid', '2r7g', '1i27']

cluster.runTasks(cmd, args, environment_module="patchsearch/2.0", map_list=pdbid_list, log_prefix="sitealign")