-1

I have a sample code here that uses a global variable, and its giving me errors. The global variable x is declared in test3 function before calling test2 function, but the test2 function doesn't appear to get the definition of the global variable x

from multiprocessing import Pool
import numpy as np

global x    

def test1(w, y):
    return w+y    

def test2(v):
    global x        # x is assigned value in test3 before test2 is called
    return test1(x, v)    

def test3():
    global x
    x = 2
    y = np.random.random(10)
    with Pool(processes=6) as p:
        z = p.map(test2, y)
    print(z)

if __name__ == '__main__':
    test3()

The error is:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\WinPython-64bit-3.5.2.1Qt5\python-3.5.2.amd64\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
  File "C:\WinPython-64bit-3.5.2.1Qt5\python-3.5.2.amd64\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
  File "...\my_global_variable_testcode.py", line 23, in test2
return test1(x, v)
NameError: name 'x' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "...\my_global_variable_testcode.py", line 35, in <module>
test3()
  File "...\my_global_variable_testcode.py", line 31, in test3
z = p.map(test2, y)
  File "C:\WinPython-64bit-3.5.2.1Qt5\python-3.5.2.amd64\lib\multiprocessing\pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\WinPython-64bit-3.5.2.1Qt5\python-3.5.2.amd64\lib\multiprocessing\pool.py", line 608, in get
raise self._value
NameError: name 'x' is not defined

I have looked at a lot of questions and answers on SO, but still haven't been able to figure out how to fix this code. Would be grateful if someone can point out what is the issue with the code?

Can anyone show me how to rewrite the code above, without changing the basic structure of code (i.e. retaining test1, test2, test3 as 3 separate functions, as in my original code these functions are quite long and complex) so that I can achieve my goal of multi-processing?

p.s. This sample code is just a simplified version of my actual code, and I am giving this simplified version here to figure out how to make global variables work (not trying to find a complicated way for 2+np.random.random(10)).

* EDIT * - BOUNTY DESCRIPTION

This bounty is for someone who can help me re-write this code, preserving the basic structure of functions in the code:

(i) test1 does the multi-processing call to test2, and test2 in turn calls test3

(ii) makes use of either global variables or the Manager class of multiprocessing module or anything else to avoid having test1 pass common variables to test2

(iii) test1 also gives some values or makes changes to the global variables / common data before calling the multiprocessing code

(iv) Code should work on Windows (as i am using Windows). Not looking for a solution that works on Linux / OSX at this time.

To help with the bounty, let me give two different test cases.

* case 1 - non-multiprocessing version *

import numpy as np

x = 3

def test1(w, y):
    return w+y

def test2(v):
    global x
    print('x in test2 = ', x)
    return test1(x, v)

def test3():
    global x
    x = 2
    print('x in test3 = ', x)
    y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    z = test2(y)
    print(z)

if __name__ == '__main__':
    test3()

The output (correct) is:

x in test3 =  2
x in test2 =  2
[ 3  4  5  6  7  8  9 10 11 12]

* case 2 - multi-processing version *

from multiprocessing import Pool
import numpy as np

x = 3

def test1(w, y):
    return w+y

def test2(v):
    global x
    print('x in test2 = ', x)
    return test1(x, v)

def test3():
    global x
    x = 2
    print('x in test3 = ', x)
    y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    with Pool(processes=6) as p:
        z = p.map(test2, y)
    print(z)

if __name__ == '__main__':
    test3()

The output (incorrect) is

x in test3 =  2
x in test2 =  3
x in test2 =  3
x in test2 =  3
x in test2 =  3
x in test2 =  3
x in test2 =  3
x in test2 =  3
x in test2 =  3
x in test2 =  3
x in test2 =  3
[4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
uday
  • 6,453
  • 13
  • 56
  • 94
  • 1
    You're using windows, that's why. Why not just pass `x` as an argument to the function using `args`? – cs95 Oct 11 '17 at 01:30
  • @COLDSPEED, in my actual function, the size of `x` is very large, and the size of `y` is around 30,000 instead of just 10 in this dummy exercise. Since I am only calculating 6 of the 30,000 `y`'s at a time using multi-processing, I trying to avoid passing something like `zip(y, x_replicated_30k_times)` to the `map` function. Why is windows an issue here? – uday Oct 11 '17 at 01:34
  • Because, unix like OSes rely on fork + exec. However windows machines function differently. It's linked to the reason you need `if __name__ == '__main__':` to prevent infinite recursion. – cs95 Oct 11 '17 at 01:36
  • I'm not sure what you mean. I have used `if __name__ == '__main__'` with both Ubuntu 14.04 and OSX Yoshemite, and used codes using the `multiprocessing` toolbox exactly the same way without making any chances – uday Oct 11 '17 at 01:38
  • windows multiprocessing why is __name__ == __main__ needed – cs95 Oct 11 '17 at 01:39
  • can you recommend a solution to make the code work? – uday Oct 11 '17 at 01:40
  • If you are using shared variables, I recommend `Manager`s: https://stackoverflow.com/questions/17377426/shared-variable-in-pythons-multiprocessing, and also https://docs.python.org/3/library/multiprocessing.html#sharing-state-between-processes – cs95 Oct 11 '17 at 01:41
  • can you convert my code above to an example with `Manager` ? – uday Oct 11 '17 at 02:09
  • 1
    `global` doesn’t do anything at file scope (why does the parser even allow it?) or in your `test2` that doesn’t assign to it. – Davis Herring Oct 13 '17 at 20:10
  • as per python description, in `test2` global is required to access the global variable within a function – uday Oct 13 '17 at 21:08

2 Answers2

2

You have to define the variable x outside the functions, for instance instead of global x, say x = 0 or anything you like and use global declaration in functions just like how you're doing now. Hope that helps

Aswini
  • 46
  • 4
  • That's exactly the solution and it works. I will award you the bounty as soon as SO allows me. I am not sure why someone downgraded the question - I couldn't find the fix in spite of searching for it many days (and wasting days on alternatives like COLDSPEED's suggestion to use `Manager` and similarly other solutions from other blogs). Many thanks for pointing out the fix. – uday Oct 13 '17 at 21:12
  • Actually, it is not yet the fix. If I remove the `global x` outside the functions with `x = 0` and run the code, `test2` never gets `x = 2` assignment within `test3` and returns `y` assuming `x=0`. Can you post a complete code? – uday Oct 13 '17 at 21:19
2

Your problem is that you are sharing a variable in Process and not in Multiprocess pool. When you use global x it can work in a individual process but not across multiple processes. In that case you need to use Value from multiprocessing. Below is an updated code which works in multiprocessing

from multiprocessing import Pool, Value
import numpy as np

xVal = Value('i', 0)

def test1(w, y):
    return w+y

def test2(v):
    x = xVal.value
    print('x in test2 = ', x)
    return test1(x, v)

def test3():
    xVal.value = 2

    print('x in test3 = ', xVal.value)
    y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    with Pool(processes=6) as p:
        z = p.map(test2, y)
    print(z)

if __name__ == '__main__':
    test3()

And output of the program is as below

x in test3 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
x in test2 =  2
[3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

Edit-2

Below program should work out on Windows also

from multiprocessing import Pool, Value, Manager, Array
import multiprocessing
import numpy as np

xVal = None

def sharedata(sharedData):
    global xVal
    xVal = sharedData

def test1(w, y):
    return w+y

def test2(v):
    global xVal
    x = xVal.value
    print('x in test2 = ', x)
    return test1(x, v)


def test3():
    xVal.value = 2
    print('x in test3 = ', xVal.value)
    y = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
    with Pool(processes=6, initializer=sharedata,initargs=(xVal,)) as p:
        z = p.map(test2, y)
    print('x in test3 = ', xVal.value)
    print(z)

if __name__ == '__main__':
    xVal = Value('i', 0)
    test3()
Tarun Lalwani
  • 142,312
  • 9
  • 204
  • 265
  • Didn't work on my machine. Are you using Windows, or Linux / OSX? When I run your code (in Windows) I get `x in test2 = 0`. – uday Oct 14 '17 at 18:00
  • unfortunately my Windows VM has no space left, so I had to test this on OSX only, hoping it works on Windows, will see if I can extend the space in Windows VM and test something over there – Tarun Lalwani Oct 14 '17 at 18:03
  • @uday, please check the latest answer – Tarun Lalwani Oct 14 '17 at 20:09