0

I am having some trouble using a tuple globally with the multiprocessing class.

I have a code as produced below:

from multiprocessing import Pool

if __name__ == '__main__':
    jj = ()
    def f(x):
        global jj
        jj += (x*x,)

    # Section A
    #for ii in range(20):
    #    f(ii)
    #print (jj)

    # Section B
    pool = Pool(processes=4)
    pool.map(f, range(20))  
    pool.join()
    print (jj)

If I run section only B, I get the tuple jj as an empty tuple. However, if I run only section A, I get a tuple of length 20.

Why is that so?

Shihab Khan
  • 213
  • 1
  • 13
  • Please run either section A or section B at a time. Comment out the other section not in use. I use tuple because in my original code, I have to store a lot of variables and I read that the instantiation of tuples is much faster than a list. I am not calling these values anywhere in the code. – Shihab Khan Nov 22 '18 at 08:59

1 Answers1

1

Ok, as Python multiprocessing global variable updates not returned to parent explains, global state is not shared among processes.

You can share state using, for example, multiprocessing.Queue.

from multiprocessing import Pool, Queue

if __name__ == "__main__":
    jj = ()
    q = Queue()

    def f(x):
        global jj
        jj += (x * x,)

    def f_multi(x):
        q.put(x * x)

    # Section A
    for ii in range(20):
        f(ii)
    print(jj)

    # Section B
    pool = Pool(processes=4)
    pool.map(f_multi, range(20))
    pool.close()

    stop = "STOP"
    q.put(stop)
    items = []
    for i in iter(q.get, stop):
        items.append(i)

    print(tuple(items))

Alternatively you can use print(tuple(sorted(items))) to get the values in the same order as Section A will produce. 4 processes are working on the task in Section B and hence the "unordered" result.

Dušan Maďar
  • 9,269
  • 5
  • 49
  • 64
  • Thanks a lot. Actually @Dušan, the problem which I am facing is that I don't have a single variable I'd have to put in a Queue. I have a function that generates a lot of data. I'm trying to store that data from each iteration so that I can use them later on for validating/checking my results. Is there any efficient way to do that? It seems to be that I'll have to do this for each of the variables. – Shihab Khan Nov 22 '18 at 09:34
  • 1
    You can put all variables you want to return from your function to a `dict` and then put that dict to the `Queue`, e.g. `q.put({'data1': [1,2,3], 'data2': (0.5, 0.6)})`. – Dušan Maďar Nov 22 '18 at 09:43
  • This sounds like a good idea. If I can make the function return this dictionary then I don't even have to use a queue. – Shihab Khan Nov 22 '18 at 09:53