2

I have a fairly high-level question about Python and running interactive simulations. Here is the setup:

I am porting to Python some simulation software I originally wrote in Smalltalk (VW). It is a kind of Recurrent Neural Network controlled interactively from a graphical interface. The interface allows the manipulation of most the network's parameters in real time, in addition to controlling the simulation itself (starting it, stopping it, etc). In the original Smalltalk implementation, I had two processes running with different priority levels:

  1. The interface itself with a higher priority
  2. The neural network running forever at a lower priority

Communication between the two processes was trivial, because all Smalltalk processes share the same address space (the Object memory).

I am now starting to realize that replicating a similar setup in Python is not so trivial. The threading module does not allow its threads to share address space, as far as I can tell. The multiprocessing module does, but in a rather complex way (with Queues, etc).

So I am starting to think that my Smalltalk perspective is leading me astray and I am approaching a relatively simple problem from the wrong angle altogether. Problem is, I don't know what is the right angle! How would you recommend I approach the problem? I am fairly new to Python (obviously) and more than willing to learn. But I would greatly appreciate suggestions on how to frame the issues and which multiprocessing modules (if any!) I should delve into.

Thanks,

Stefano

Shankar
  • 3,496
  • 6
  • 25
  • 40
stefano
  • 769
  • 2
  • 10
  • 24
  • 3
    "The threading module does not allow its threads to share address space" -- Where did you get this idea from? Any Python Thread can access any memory from any other Thread in the same process. Though it still needs concurrency protective measures to do so safely. – Pyrce Apr 15 '13 at 17:05
  • one thing to keep in mind is that threads in python(at least using the reference python imp) do not run concurrently. – cmd Apr 15 '13 at 18:16
  • also sockets in python are fairly trivial – cmd Apr 15 '13 at 18:17
  • ""The threading module does not allow its threads to share address space" -- Where did you get this idea from? " – stefano Apr 15 '13 at 21:57
  • @Pryce: Notice that I added 'as far as I can tell.' From the discussions I read about the GIL I draw the conclusion that you cannot (1) have a process (thread) that runs a never-ending task that involves constantly updating some (fairly complex) data structure and (2) a second process that runs concurrently and with higher priority and which updates the same data structures. Am I wrong? Can you point me to examples on how to do (1) and (2)? I looked into the threading and multiprocessing modules and could not find a way to do it. – stefano Apr 15 '13 at 22:05
  • @cmd: threads do run concurrently but not always in parallel. – jfs May 16 '13 at 15:57
  • @stefano: The word thread suggests common address space. Processes usually require explicit action to share state. Here's [gtk tree example: gui (main thread) show file tree while it is loaded concurrently from disk in the background thread](http://askubuntu.com/a/183315/3712). – jfs May 16 '13 at 16:08
  • @J.F.Sebastian Concurrently = "at the same time as something else". python threads do NOT run at the same time. – cmd May 16 '13 at 16:33
  • @cmd: See [Concurrent computing](http://en.wikipedia.org/wiki/Concurrent_computing). Notice the word *"may"* in the description. You can run code concurrently even without threads e.g., `gevent` greenlets. In addition CPython releases GIL on I/O and various C extensions such as numpy can release GIL so you can also get "at the same time" with threads. – jfs May 16 '13 at 16:50
  • @J.F.Sebastian I made no mention of Concurrent computing. I says threads do not "run concurrently" in python. Stop using your prestigious to change the meaning of what I said. – cmd May 16 '13 at 19:25

1 Answers1

0

I'll offer my take on how to approach this problem. Within the multiprocessing module the Pipe and Queue IPC mechanisms are really the best way to go; in spite of the added complexity you allude to, it's worth learning how they work. The Pipe is fairly straightforward so I'll use that to illustrate.

Here's the code, followed by some explanation:

import sys
import os
import random
import time
import multiprocessing

class computing_task(multiprocessing.Process):
    def __init__(self, name, pipe):
        # call this before anything else
        multiprocessing.Process.__init__(self)

        # then any other initialization
        self.name = name
        self.ipcPipe = pipe
        self.number1 = 0.0
        self.number2 = 0.0
        sys.stdout.write('[%s] created: %f\n' % (self.name, self.number1))

    # Do some kind of computation
    def someComputation(self):
        try:
            count = 0
            while True:
                count += 1
                self.number1 = (random.uniform(0.0, 10.0)) * self.number2
                sys.stdout.write('[%s]\t%d \t%g \t%g\n' % (self.name, count, self.number1, self.number2))

                # Send result via pipe to parent process.
                # Can send lists, whatever - anything picklable.
                self.ipcPipe.send([self.name, self.number1])

                # Get new data from parent process
                newData = self.ipcPipe.recv()
                self.number2 = newData[0]

                time.sleep(0.5)
        except KeyboardInterrupt:
            return

    def run(self):
        sys.stdout.write('[%s] started ...  process id: %s\n' 
                         % (self.name, os.getpid()))        
        self.someComputation()

        # When done, send final update to parent process and close pipe.
        self.ipcPipe.send([self.name, self.number1])
        self.ipcPipe.close()
        sys.stdout.write('[%s] task completed: %f\n' % (self.name, self.number1))

def main():
    # Create pipe
    parent_conn, child_conn = multiprocessing.Pipe()

    # Instantiate an object which contains the computation
    # (give "child process pipe" to the object so it can phone home :) )
    computeTask = computing_task('foo', child_conn)

    # Start process
    computeTask.start()

    # Continually send and receive updates to/from the child process
    try:
        while True:
            # receive data from child process
            result = parent_conn.recv()
            print "recv: ", result

            # send new data to child process
            parent_conn.send([random.uniform(0.0, 1.0)])
    except KeyboardInterrupt:
        computeTask.join()
        parent_conn.close()
        print "joined, exiting"

if (__name__ == "__main__"):
    main()

I have encapsulated the computing to be done inside a class derived from Process. This isn't strictly necessary but makes the code easier to understand and extend, in most cases. From the main process you can start your computing task with the start() method on an instance of this class (this will start a separate process to run the contents of your object).

As you can see, we use Pipe in the parent process to create two connectors ("ends" of the pipe) and give one to the child while the the parent holds the other. Each of these connectors is a two-way communication mechanism between the processes holding the ends, with send() and recv() methods for doing what their names imply. In this example I've used the pipe to transmit lists of numbers and text, but in general you can send lists, tuples, objects, or anything that's picklable (i.e. serializable with Python's pickle facility). So you've got some latitude for what you send back and forth between processes.

So you set up your connectors, invoke start() on your new process, and you're off and computing. Here we're just multiplying two numbers, but you can see it's being done "interactively" in the subprocess with updates sent from the parent. Likewise the parent process is informed regularly of new results from the computing process.

Note that the connector's recv() method is blocking, i.e. if the other end hasn't sent anything yet, recv() will wait until something is there to read, and prevent anything else from happening in the meantime. So just be aware of that.

Hope this helps. Again, this is a barebones example and in real life you'll want to do more error handling, possibly use poll() on the connection objects, and so forth, but hopefully this conveys the major ideas and gets you started.

DMH
  • 3,875
  • 2
  • 26
  • 25
  • [updating a complex object tree (as in OPs case) using multiple processes would involve a lot of copying in CPython](http://stackoverflow.com/a/1269055/4279). – jfs May 16 '13 at 16:26
  • Thanks for the informative link, JF. Does this apply in OP's case? My reading of OP's question is that he needs to update a set of literal values ("the network's parameters") rather than a complex object shared between both processes (I infer that a shared-memory object was in his original implementation but he's moving away from that for language reasons, such as those you cite). And since he's apparently only using two processes, it seems to me a straightforward case of passing (e.g.) a list back and forth. Perhaps I've misunderstood his request. I'll let him comment. – DMH May 16 '13 at 19:12
  • Indeed, I do need to run a complex object, sorry for the misunderstanding. I wrote the interface in PyQt, and I ended up running two event loops in the two threads and adding a processEvent call (after a short delay)in the simulation thread. That worked, eve though I had to jump through some hoops to properly set up the communication between the underlying, non-Qt object (the real network) and the PyQt interface. I documented my "solution" [here](http://stackoverflow.com/questions/16246796/how-to-stop-a-qthread-from-the-gui/16282567#16282567) – stefano May 24 '13 at 20:38
  • Glad you found a solution which works for you. I'll leave my answer here in case it's useful to someone else sometime. Happy computing! – DMH May 28 '13 at 14:51