1

I am submitting jobs to a queue on a cluster and want to check if the job is done. The way I do it is to see if the jobID is present in the output of a command (called jobs) that lists all the jobs that are currently running. I call jobs via the shell, parse its output and see if jobID is there. If it isn't, that's interpreted as a signal that the job terminated:

   sleep = 2
   while True:
        output = subprocess.Popen("jobs %i" %(jobID),
                                  shell=True,
                                  stdout=subprocess.PIPE,
                                  stderr=subprocess.PIPE).communicate()
        if job_done(output):
           break
        time.sleep(sleep)

Since sleep is set to 2, it means that this is checked every two seconds, but the job might run for several hours. I find that randomly I sometimes get the OSError Cannot allocate memory, even though there's a ton of memory on the machine and the thread does nothing that is memory intensive except check for the output of jobs. What could be causing this? Is there a better way to do this than to use Popen, PIPE and communicate?

This issue seems similar to the one reported here (Python subprocess.Popen "OSError: [Errno 12] Cannot allocate memory") but there was no resolution to this issue.

Community
  • 1
  • 1
  • [The docs](https://docs.python.org/2/library/subprocess.html#popen-objects) suggest that communicate might very well run out of memory in a scenario like yours – Mr_and_Mrs_D Jun 22 '15 at 14:07
  • there are several solutions in the link that you've provided e.g., use a fork server. – jfs Jun 22 '15 at 16:43

1 Answers1

0

Which python version are you using? 2.6 or 2.7? or even newer? What's the status of your file-descriptors? See fd-issue.

At the bottom of the SO-post you mentioned there seems to be another one on the same issue. See also his proposal.

Community
  • 1
  • 1
Don Question
  • 11,227
  • 5
  • 36
  • 54
  • I am using python 2.6.5. I read the post you linked to but I am not using `close_dfs=True` which that post said was the cause of the problem. Should I be using it? Even though I am using `PIPE`? –  Nov 08 '12 at 22:55
  • try a strace: `strace -o syscalls.trace -ttT ./yourscript.py` or for a shorter summary `strace strace -o syscalls.trace -Cf ./yourscript.py` In the eyes of python and the os, a PIPE is a file-like object which consumes a fd – Don Question Nov 08 '12 at 23:04
  • what should I be looking for in `strace`? –  Nov 08 '12 at 23:08
  • for starters, just at the first odd-behaviour and errors. Either the firs EMEM or EBADF errors should be a help in the right direction. – Don Question Nov 08 '12 at 23:11
  • But is this the correct way safe-proof way to check if a process has a particular output, periodically, in Python? Is there a different solution? –  Nov 08 '12 at 23:12
  • strace traces systemcalls, that's way it seem to me like the right choice if my hunch with the fd is correct. alternativly, did you look into celery? – Don Question Nov 08 '12 at 23:22