Writing a Job Workflow
To execute a Python-based workflow using the cluster.py
library, you need to load the relevant Conda environment, depending on your Python version:
For Python 2 pipelines:
module load rpbs_services/py2-rpbs
For Python 3 pipelines:
module load rpbs_services/py3-rpbs
Using the Python Cluster Library
The runTasks function
runTasks(command, args, tasks, tasks_from, environment_module, log_prefix, map_list, job_name, job_opts, joinFiles, progress, partition, account, qos, wait)
Parameter | Description | Default |
---|---|---|
command | Path to the executable. | |
args | List of string arguments to pass to the command. The symbol "map_item" is substituted with the current element from the list provided by map_list. | |
tasks | Number of tasks to perform (job array). Each iteration increments the SLURM_ARRAY_TASK_ID environment variable by one. | 1 |
tasks_from | Initial value for the SLURM_ARRAY_TASK_ID environment variable. | 1 |
environment_module | Name of the environment/version to load. Should be specified in the same way as with module load . |
|
log_prefix | Prefix for the .log and .err files returned by Slurm. |
slurm |
map_list | Launches as many tasks as there are elements in the list. Each element is substituted for the "map_item" symbol provided in args. | |
job_name | Name assigned to the job. | environment_module:command |
job_opts | Additional configuration options for Slurm. | |
joinFiles | Merge all log files into a single one. If set to False , one log file will be created per task. |
True |
progress | Displays progress (Mobyle). | True |
partition | Slurm partition where the job will run. | Partition of the main job |
account | Slurm account where the job will run. | Account of the main job |
qos | QoS (Quality of Service) specification for the job. |
Example of a Simple Python Script Using the Cluster Library
#!/usr/bin/env python
import cluster.cluster as cluster
cmd = "PyPPP3Exec"
args = ["-s %s" % self.options.seqFile,
"-l %s" % self.options.label,
"-v"]
cluster.runTasks(cmd, args, environment_module="pyppp3-light/1.0-rpbs", log_prefix="ppp")
Example of a Script Iterating Over a List
#!/usr/bin/env python
import cluster.cluster as cluster
cmd = "SiteAlignReverse_service"
args = ["map_item",
"%s.pdbqt" % os.path.splitext(ps_input_file)[0],
"%s" % p_args.scoring_method,
"%s" % DFLT_BS_BANK]
pdbid_list = ['1kid', '2r7g', '1i27']
cluster.runTasks(cmd, args, environment_module="patchsearch/2.0", map_list=pdbid_list, log_prefix="sitealign")