1

In below example, if shell script shell_script.sh sends a job to cluster, is it possible to have snakemake aware of that cluster job's completion? That is, first, file a should be created by shell_script.sh which sends its own job to the cluster, and then once this cluster job is completed, file b should be created.

For simplicity, let's assume that snakemake is run locally meaning that the only cluster job originating is from shell_script.sh and not by snakemake .

localrules: that_job

rule all:
    input:
        "output_from_shell_script.txt",
        "file_after_cluster_job.txt"

rule that_job:
    output:
        a = "output_from_shell_script.txt",
        b = "file_after_cluster_job.txt"
    shell:
        """
        shell_script.sh {output.a}
        touch {output.b}
        """

PS - At the moment, I am using sleep command to give it a waiting time before the job is "completed". But this is an awful workaround as this could give rise to several problems.

Manavalan Gajapathy
  • 3,900
  • 2
  • 20
  • 43

1 Answers1

3

Snakemake can manage this for you with the --cluster argument on the command line.
You can supply a template for the jobs to be executed on the cluster.
As an example, here is how I use snakemake on a SGE managed cluster:

template which will encapsulate the jobs which I called sge.sh:

#$ -S /bin/bash
#$ -cwd
#$ -V

{exec_job}

then I use directly on the login node:

snakemake -rp --cluster "qsub -e ./logs/ -o ./logs/" -j 20 --jobscript sge.sh --latency-wait 30

--cluster will tell which queuing system to use
--jobscript is the template in which jobs will be encapsulated
--latency-wait is important if the file system takes a bit of time to write the files. You job might end and return before the output of the rules are actually visible to the filesystem which will cause an error

Note that you can specify rules not to be executed on the nodes in the Snakefile with the keyword localrules:

Otherwise, depending on your queuing system, some options exist to wait for job sent to cluster to finish:
SGE: Wait for set of qsub jobs to complete
SLURM: How to hold up a script until a slurm job (start with srun) is completely finished?
LSF: https://superuser.com/questions/46312/wait-for-one-or-all-lsf-jobs-to-complete

Eric C.
  • 3,310
  • 2
  • 22
  • 29
  • Cluster job here is dictated by `shell_script.sh`, and I would like snakemake to be aware when this job is completed before proceeding to checking for output files created. I rewrote my question to better explain the problem. – Manavalan Gajapathy Apr 26 '18 at 16:20
  • I understand what you mean but if snakemake does not manage the jobs sent to the cluster with the `--cluster` argument on command line, there is no way to correctly handle it in a rule shell. You didn't show your script `shell_script.sh` so it is hard to answer. Which queuing system are you using? – Eric C. Apr 27 '18 at 08:18
  • Yeah, even I thought snakemake is not designed for this but wanted to try my luck. `shell_script.sh` is bit of a complicated script that sends a job to LSF cluster in the end. Ideally `shell_script.sh` should be rewritten as a snakefile, but can't afford time for that. – Manavalan Gajapathy Apr 27 '18 at 15:04
  • Edited my answer with some links that may be of interest – Eric C. Apr 27 '18 at 15:41
  • Your new suggestion looks quite interesting and capable of solving this problem. Thanks! – Manavalan Gajapathy Apr 27 '18 at 17:28
  • Finally tried [your solution](https://superuser.com/a/78470) to use `wait`, and it works like a charm. – Manavalan Gajapathy May 04 '18 at 20:30