2

I need to run a long foobar.py process with Popen and parse its output with a multiprocessing process.

My problem is that sometimes I cannot wait for the parser to finish, so I need to daemonize the parser using the multiprocessing daemon property. I need the parser to be usable both deamonic non daemonic ways. The doc also says that a daemonic process is not allowed to create child processes. So in that case I run the Popen process before the parser is forked (see the start method below).

class Parser(multiprocessing.Process):
    def __init__(self, daemon, output):
        super(Parser, self).__init__()
        self.daemon = daemon
        self.output = output

    def start(self):
        if self.daemon:
            self.launchFoobar() # Foobar is launched before forking
        super(Parser, self).start()

    def run(self):
        if not self.daemon:
            self.launchFoobar() # Foobar is launched after forking
        self.parseFoobar()

    def launchFoobar(self):
        self.process = subprocess.Popen(["python", "foobar.py"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

    def parseFoobar(self):
        with open(self.output, "w+") as f:
            for line in iter(self.process.stdout):
                f.write(line)
                f.flush()
        self.process.wait()

Let's say here that foobar.py just waits some seconds and print something, and parseFoobar method just prints output in a file. In my case both functions are a lot more complex than this.

Running Parser(daemon=False, "sync.txt").start() works fine and there is some output in sync.txt. But running Parser(daemon=True, "async.txt") does not produce anything in async.txt and seem to be blocked at line for line in iter(self.process.stdout): because the file is created, but it is empty.

Why doesn't it work? How can I fix it?

You can find gists for parser.py and foobar.py for testing. Just run python parser.py and look at output files.

Edit: There are some tips in django daemonize methods

azmeuk
  • 4,026
  • 3
  • 37
  • 64

1 Answers1

3

You're actually getting the exact behavior you want. You're creating a daemon Process, and then almost immediately exiting the main process. That doesn't give the daemon Process enough time to actually execute foobar.py and parse its output before it gets terminated. If you add a call to async.join() at the end of the program, you'll see that async.txt does get written.

Also note that you can call subprocess.Popen from inside the daemon multiprocessing.Process. That note about daemon processes not being able to create subprocesses is actually just talking about creating child multiprocessing.Process objects. The limitation is there because the daemon Process won't be able to properly clean up child Process objects, but processes opened by subprocess.Popen don't get cleaned up when the parent exits, anyway.

dano
  • 91,354
  • 19
  • 222
  • 219
  • What is the point of creating a daemon process if you call join() at the end? In this very case I need to create the process and just forget it. It is very likely that the parser die after its parent. – azmeuk Sep 30 '14 at 13:52
  • @azmeuk That's fine, then you can do that. But then sometimes the daemon `Process` isn't going to do the work you expect it to. In this case, the parent is exiting before the daemon `Process` can parse the output of `foobar.py`. – dano Sep 30 '14 at 13:53
  • @azmeuk I would say that in general it's actually dangerous to have a daemon `Process` write to a file, because the file could easily end up in a corrupted state if the daemon is abruptly terminated in the middle of writing to it. – dano Sep 30 '14 at 13:55
  • The principle of a daemon process is that it can survive its parent, am I wrong? In that case how can it abruptly terminate? – azmeuk Sep 30 '14 at 13:57
  • @azmeuk No, daemon `multiprocessing.Process` objects are **terminated** as soon as the parent process exits. They don't live beyond the parent's lifetime. `multiprocesing.Process` is meant to be very similar to `threading.Thread`; a daemon `Thread` can't live beyond the life of its parent process, so neither does a `multiprocessing.Process`. – dano Sep 30 '14 at 13:59
  • I see. [It seems that](http://stackoverflow.com/questions/24694763/how-to-let-the-child-process-live-when-parent-process-exited) ```os._exit``` can be used for children to survive parents though. – azmeuk Sep 30 '14 at 14:03
  • @azmeuk Yes, just note that no cleanup of the parent whatsoever is done when you use that. I actually think that your non-daemon `Process` will live beyond the life of the parent if you use that approach. – dano Sep 30 '14 at 14:05