3

I have a tool that I am working on and I need it to run a parser and also output another analysis log. Currently I have it so that it's through a web interface.

  1. User goes to the form and submits a filename for parsing (file already on system).
  2. Form submits information to Python CGI script
  3. Python CGI script runs and spawns a subprocess to run the parsing.
  4. Parser finds appropriate information for analysis and spawns subprocess also.

I am using

import subprocess
...
subprocess.Popen(["./program.py", input])

In my code and I assumed from documentation that we don't wait on the child process to terminate, we just keep running the script. My CGI script that starts all this does:

subprocess.Popen(["./program.py", input])
// HTML generation code
// Javascript to refresh after 1 second to a different page

The HTML generation code is to output just a status that we've processed the request and then the javascript refreshes the page to the main homepage.

The Problem

The CGI page hangs until the subprocesses finish, which is not what I want. I thought Popen doesn't wait for the subprocesses to finish but whenever I run this tool, it stalls until it's complete. I want the script to finish and let the subprocesses run in the background and let the webpages still function properly without the user thinking everything is just stalled with the loading signals.

I can't seem to find any reason why Popen would do this because everywhere I read it says it does not wait, but it seems to.

Something odd also is that the apache logs show: "Request body read timeout" as well before the script completes. Is Apache actually stalling the script then?

Sorry I can't show complete code as it's "confidential" but hopefully the logic is there to be understood.

Chris Seymour
  • 83,387
  • 30
  • 160
  • 202
richardhsu
  • 878
  • 5
  • 13

2 Answers2

1

Apache probably waits for the child process to complete. You could try to demonize the child (double fork, setsid) or better just submit the job to a local service e.g., by writing to a predefined file or using some message broker or via higher level interface such as celery

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Thanks, I figured as much and then I stumbled upon this thread earlier and now I decided it was worth at try to just do the whole command they had for Popen: http://stackoverflow.com/questions/546017/how-do-i-run-another-script-in-python-without-waiting-for-it-to-finish and for some reason it works now. Including the stdout and stderr parameters seemed to allow the script to end and not wait for the subprocesses. Thanks! – richardhsu Jul 03 '12 at 19:38
1

Not sure exactly why this works but I followed the answer in this thread: How do I run another script in Python without waiting for it to finish?

To do:

p = subprocess.Popen([sys.executable, '/path/to/script.py'], 
                     stdout=subprocess.PIPE, 
                     stderr=subprocess.STDOUT)

Instead of:

p = subprocess.Popen([sys.executable, '/path/to/script.py'])

And for some reason now the CGI script will terminate and the subprocesses keep running. Any insight as to why there is a difference would be helpful? I don't see why having to define the other two parameters would cause such a stall.

Community
  • 1
  • 1
richardhsu
  • 878
  • 5
  • 13
  • if script.py generates enough output it might block. Redirect stdin, stdout, stderr to devnull instead, [example](http://stackoverflow.com/a/11270665/4279). `close_fds=True` wouldn't hurt on POSIX systems. Are you sure that your code works? – jfs Jul 03 '12 at 19:53
  • Yeah, the CGI script doesn't have the loading symbol anymore and Apache logs don't show the "Request body read timeout" anymore. I thought similarly that there would be some blocking if I specificed the std outputs but apparently adding those seemed to fix it. No other changes were made :/ – richardhsu Jul 03 '12 at 20:10