Run a whole Twisted application in multiple processes

Question

I know about the limitations of Twisted for multiprocess applications, but my question is different. I am not trying to run a server or client using multiple processes. I already have a running application that takes a number of directories and performs some operations on them. I want to divide the work in chunks, spawning a process with the same application for each subdirectory. I can do this by running the application multiple times from the shell and passing a different subdirectory as argument each time.

In the main I have something like:

from multiprocessing import Pool
...
p = Pool(num_procs)
work_chunks = [work_chunk] * len(configs)
p.map(run_work_chunk, zip(work_chunks, configs))
p.close()
p.join()

where:

def run_work_chunk((work_chunk, config)):
    from twisted.internet import reactor
    d = work_chunk.configure(config)

    d.addCallback(lambda _: work_chunk.run())
    d.addErrback(handleLog)
    print "pid=", getpid(), "reactor=", id(reactor)
    reactor.run()
    return

class WorkChunk(object):
    ...
    def run(self):
        # do stuff
        ...
        reactor.stop()

Let's say num_procs is 2, then the output would be something like:

pid=2 reactor=140612692700304

pid=6 reactor=140612692700304

And you can't see any output for the workers working in other chunks.

The problem is that when reactor.stop() is called, it stops all the reactors because each process uses the same reactor. I thought that when spawning a new process, all the stack was copied, but in this case it is copying the reference to the reactor, so all processes use the same reactor object.

Is there a way to instantiate a different reactor object for each process? (as if it was really a completely different process and not a child process)

score 0 · Answer 1 · answered Jun 04 '15 at 12:10

0

Is there a way to instantiate a different reactor object for each process? (as if it was really a completely different process and not a child process)

If you really mean process then the best way is to run the code multiple times (and/or fork/exec to create new processes from your initial process).

There is no magic to managing multiple reactors, its done the same way you run multiple programs in any other context.

answered Jun 04 '15 at 12:10

Mike Lutz

1,812
1
10
17

Thanks for the answer. Isn't it what I'm doing in the source of my question? In unix, the multiprocessing module forks to create new processes. The question is that even though I fork, the reactor object is shared among the forked processes... – synack Jun 04 '15 at 12:54
@mjm: I haven't worked with the multiprocessing python lib, so I agree, judging by its docs that it should be doing something along the lines of what I'm talking about, but take a look at SO: http://stackoverflow.com/questions/5715217/mix-python-twisted-with-multiprocessing I suspect something non-twisted friendly is going on with the multiprocessing libs behavior. – Mike Lutz Jun 04 '15 at 14:05
And see: http://stackoverflow.com/questions/11272874/is-twisted-incompatible-with-multiprocessing-events-and-queues . I think this might be explaining the problem your seeing (particularly where Jean-Paul talks about the issue of not running an `exec()`) – Mike Lutz Jun 04 '15 at 14:19
Thanks for the links. I had already read them. That is what I was referring to with `I know about the limitations of Twisted for multiprocess applications`. My problem is that I'm not using multiprocessing for the internals of the twisted application at all. I'm using multiprocessing to spawn the application as a whole in a different process.... – synack Jun 04 '15 at 14:21
If your code (directly or via the library) is `exec`ing the twisted code you should be fine. If not, as Jean-Paul explained, your in a world of dangerous shared state. FYI: If you have reviewed other SOs to pose your question, it saves answer's time if you list them in your question. – Mike Lutz Jun 04 '15 at 14:28

Run a whole Twisted application in multiple processes

1 Answers1