6

I've been spending some time trying to understand multiprocessing, though its finer points evade my untrained mind. I've been able to get a pool to return a simple integer, but if the function doesn't just return a result like all of the examples I can find (even in the documentation, it's some obscure example I can't quite understand.

Here is an example I'm trying to get working. BUT, I can't get it working as intended, and I'm sure there's a simple reason why. I may need to use a queue or shared memory or a manager, but as many times as I read the documentation I can't seem to wrap my brain around what it actually means and what it does. All I've been able to get an understanding of so far is the pool function.

Also, I'm using a class as I need to avoid using global variables as in this question's answer.

import random

class thisClass:
    def __init__(self):
        self.i = 0

def countSixes(myClassObject):
    newNum = random.randrange(0,10)
    #print(newNum) #this proves the function is being run if enabled
    if newNum == 6:
        myClassObject.i += 1

if __name__ == '__main__':
    import multiprocessing
    pool = multiprocessing.Pool(1) #use one core for now

    counter = thisClass()

    myList = []
    [myList.append(x) for x in range(1000)]

    #it must be (args,) instead of just i, apparently
    async_results = [pool.apply_async(countSixes, (counter,)) for i in myList]

    for x in async_results:
        x.get(timeout=1)

    print(counter.i)

Can someone explain in dumb-dumb what needs to be done so I can finally understand what I'm missing and what it does?

squid808
  • 1,430
  • 2
  • 14
  • 31
  • 2
    Rereading your question, I understand now that you thought using a class would avoid [this problem](http://stackoverflow.com/questions/2080660/python-multiprocessing). It won't. If you really want to share memory between processes (which the docs themselves advise against!) then you'll have to use `multiprocessing`'s built-in datatypes as described [here](http://docs.python.org/library/multiprocessing.html#sharing-state-between-processes). – senderle Jun 15 '11 at 17:27

1 Answers1

12

It took me a while to understand what you want to happen. The problem has to do with the way multiprocessing works. Basically, you need to write your program in a functional style, instead of relying on side-effects as you do now.

Right now, you're sending out objects to your pool to be modified and returning nothing from countSixes. That won't work with multiprocessing, because in order to sidestep the GIL, multiprocessing creates a copy of counter and sends it to a brand new interpreter. So when you increment i, you're actually incrementing a copy of i, and then, because you return nothing, you are discarding it!

To do something useful, you have to return something from countSixes. Here's a simplified version of your code that does something similar to what you want. I left an argument in, just to show what you ought to be doing, but really this could be done with a zero-arg function.

import random

def countSixes(start):
    newNum = random.randrange(0,10)
    if newNum == 6:
        return start + 1
    else:
        return start

if __name__ == '__main__':
    import multiprocessing
    pool = multiprocessing.Pool(1) #use one core for now

    start = 0
    async_results = [pool.apply_async(countSixes, (start,)) for i in range(1000)]

    print(sum(r.get() for r in async_results))
senderle
  • 145,869
  • 36
  • 209
  • 233
  • 5
    You sir are a gentleman and a scholar. I was able to use this to refashion my program to return a class instance (which is also why I needed classes) and now things are working! I could NOT have done it without your explanation! *dances* – squid808 Jun 16 '11 at 19:48