1

I would like to submit jobs to a computer cluster via the scheduler SGE using a pipe:

$ echo -e 'date; sleep 2; date' | qsub -cwd -j y -V -q all.q -N test 

(The queue might be different depending on the particular cluster.)

Running this command-line in a bash terminal works for me on the cluster I have access to, with GNU bash version 3.2.25, GE version 6.2u5 and Linux 2.6 x86_64.

In Python 2.7.2, here are my commands (the whole script is available as a gist):

import subprocess
queue = "all.q"
jobName = "test"
cmd = "date; sleep 2; date"
echoArgs = ["echo", "-e", "'%s'" % cmd]
qsubArgs = ["qsub", "-cwd", "-j", "y", "-V", "-q", queue, "-N", jobName]

Case 1: using shell=True makes it work:

wholeCmd = " ".join(echoArgs) + " | " + " ".join(qsubArgs)
out = subprocess.Popen(wholeCmd, shell=True, stdout=subprocess.PIPE)
out = out.communicate()[0]
jobId = out.split()[2]

But I would like to avoid that for security reasons explained in the official documentation.

Case 2: using the same code as above but with shell=False results in the following error message, so that the job is not even submitted:

Traceback (most recent call last):
  File "./test.py", line 22, in <module>
    out = subprocess.Popen(cmd, shell=False, stdout=subprocess.PIPE)
  File "/share/apps/lib/python2.7/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/share/apps/lib/python2.7/subprocess.py", line 1228, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

Case 3: therefore, following the official documentation as well as this on SO, here is one proper way to do it:

echoProc = subprocess.Popen(echoArgs, stdout=subprocess.PIPE)
out = subprocess.check_output(qsubArgs, stdin=echoProc.stdout)
echoProc.wait()

The job is successfully submitted, but it returns the following error message:

/opt/gridengine/default/spool/compute-2-27/job_scripts/3873705: line 1: echo 3; date; sleep 2; date: command not found

This is something I don't understand.

Case 4: another proper way to do it following this is:

echoProc = subprocess.Popen(echoArgs, stdout=subprocess.PIPE)
qsubProc = subprocess.Popen(qsubArgs, stdin=echoProc.stdout, stdout=subprocess.PIPE)
echoProc.stdout.close()
out = qsubProc.communicate()[0]
echoProc.wait()

Here again the job is successfully submitted, but returns the following error message:

/opt/gridengine/default/spool/compute-2-32/job_scripts/3873706: line 1: echo 4; date; sleep 2; date: command not found

Did I make mistakes in my Python code? Could the problem come from the way Python or SGE were compiled and installed?

Community
  • 1
  • 1
tflutre
  • 3,354
  • 9
  • 39
  • 53

2 Answers2

0

You're getting "command not found" because 'echo 3; date; sleep 2; date' is being interpreted as a single command.

Just change this line:

echoArgs = ["echo", "-e", "'%s'" % cmd]

to:

echoArgs = ["echo", "-e", "%s" % cmd]

(I.e., remove the single quotes.) That should make both Case 3 and Case 4 work (though it will break 1 and 2).

leekaiinthesky
  • 5,413
  • 4
  • 28
  • 39
0

Your specific case could be implemented in Python 3 as:

#!/usr/bin/env python3
from subprocess import check_output

queue_name = "all.q"
job_name = "test"
cmd = b"date; sleep 2; date"
job_id = check_output('qsub -cwd -j y -V'.split() +
                      ['-q', queue_name, '-N', job_name],
                      input=cmd).split()[2]

You could adapt it for Python 2, using Popen.communicate().

As I understand, whoever controls the input cmd may run arbitrary commands already and therefore there is no much point to avoid shell=True here:

#!/usr/bin/env python
from pipes import quote as shell_quote
from subprocess import check_output

pipeline = 'echo -e {cmd} | qsub -cwd -j y -V -q {queue_name} -N {job_name}'
job_id = check_output(pipeline.format(
    cmd=shell_quote(cmd),
    queue_name=shell_quote(queue_name),
    job_name=shell_quote(job_name)),
                      shell=True).split()[2]

Implementing the pipeline by hand is error-prone. If you don't want to run the shell; you could use plumbum module that supports a similar pipeline syntax embedded in pure Python:

#!/usr/bin/env python
from plumbum.cmd import echo, qsub # $ pip install plumbum

qsub_args = '-cwd -j y -V -q'.split() + [queue_name, '-N', job_name]
job_id = (echo['-e', cmd] | qsub[qsub_args])().split()[2]
# or (qsub[qsub_args] << cmd)()

See How do I use subprocess.Popen to connect multiple processes by pipes?

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670