1

I am using multiprocessing to speed up (dramatically speeds up by the way) my program but it is crucial that a certain global variable gets updated. The global variable is only used in the same class that uses multiprocessing so maybe there is a workaround for this variable not getting updated. Here is the code for a test I am using to try and solve this issue:

aylmao = []

def test(a):
    aylmao.append(a)


if __name__ == '__main__':
    d = [1,2,3,4,5,6,7,8,9]
    pool = Pool(cpu_count() * 2)
    pool.map(test, d)

    print(aylmao)

So in my main code I have a function that is called using pool.map and it updates this global variable. But at the end of the program it pickles the information in the global variable so that I can continue where my program left off from. However using pool.map makes it so that this global variable is empty by the time of printing and I am unsure how to work around that. Any help is extremely appreciated as using pool.map vastly increases the speed of my program.

if i instead run the code like this:

aylmao = []

def test(a):
    aylmao.append(a)


if __name__ == '__main__':
    d = [1,2,3,4,5,6,7,8,9]
    for i in d:
        test(i)

    print(aylmao)

the output is [1,2,3,4,5,6,7,8,9] which is exactly what I want. but when using pool.map(test,d) the output is []. How can I make sure the global variable is being updated when using pool.map(test,d)

Pookie
  • 1,239
  • 1
  • 14
  • 44
  • what cpu_count() function does? – jits_on_moon Jul 19 '18 at 03:08
  • Why would `pool.map` empty the variable? – Barmar Jul 19 '18 at 03:08
  • I understand that this is a toy code, but if your real code is anything similar, you should be aware that `pool.map` will aggregate returns from the function. So if `test` just returns `a`, `aylmao = pool.map(...)` does exactly what (you say) you want. (If your toy code is sufficiently different than your real code that this doesn't work, then give us a better toy code :D ) – Amadan Jul 19 '18 at 03:08
  • @Amadan It sounds like he's using the global variable to hold incremental state of all the threads, not just their return values. He says he uses this to resume where he left off. – Barmar Jul 19 '18 at 03:09
  • I suspect the problem is that you're not properly reloading the variable from the saved state when you restart. I doubt it's related to multiprocessing. – Barmar Jul 19 '18 at 03:11
  • I updated the question to add a lot of clarity – Pookie Jul 19 '18 at 03:11
  • @Barmar I believe the issue is when using multiprocessing the global states for each process are different. I am looking for a way to update this global variable in a way such that all processes can run and update the global variable. Also could you elaborate on what you mean by updating the variable from the saved state. – Pookie Jul 19 '18 at 03:13
  • If you just want to have a variable you can update from all processes, how about using `multiprocessing.Manager`? – Amadan Jul 19 '18 at 03:13
  • @Amadan I think you might be on to what I need. Could you write an answer that would take this toy problem and use Manager to produce what I am looking for so I can replicate it on my main code(and give anyone else looking the answer) – Pookie Jul 19 '18 at 03:14
  • 1
    Possible duplicate of [Python multiprocessing global variable updates not returned to parent](https://stackoverflow.com/questions/11055303/python-multiprocessing-global-variable-updates-not-returned-to-parent) – Barmar Jul 19 '18 at 03:17

1 Answers1

1

Using multiprocessing.Manager:

from multiprocessing import *

manager = Manager()
aylmao = manager.list()

def test(a):
    aylmao.append(a)

d = [1,2,3,4,5,6,7,8,9]
pool = Pool(cpu_count() * 2)
pool.map(test, d)

print(aylmao)
# => [1, 2, 5, 4, 3, 6, 8, 7, 9]

EDIT: "What if aylmao was a map that mapped a string to an object I created? would it still work?"

from multiprocessing import *

class Foo:
    def __init__(self, n):
        self.n = n
    def __repr__(self):
        return "Foo(%d)" % self.n

manager = Manager()
aylmao = manager.dict()

def test(a):
    aylmao[str(a)] = Foo(a)

d = [1,2,3,4,5,6,7,8,9]
pool = Pool(cpu_count() * 2)
pool.map(test, d)

print(aylmao)
# => {'1': Foo(1), '3': Foo(3), '2': Foo(2), '4': Foo(4), '5': Foo(5), '7': Foo(7), '6': Foo(6), '8': Foo(8), '9': Foo(9)}
Amadan
  • 191,408
  • 23
  • 240
  • 301
  • What if aylmao was a map that mapped a string to an object I created? would it still work? – Pookie Jul 19 '18 at 03:19
  • I am getting a runtime error saying: an attempt has been made to start a new process before the current process has finished its bootstrapping phase." – Pookie Jul 19 '18 at 20:51
  • Are you possibly on Windows, and trying to freeze your code (py2exe, pyInstaller, cx_Freeze, bbFreeze...)? – Amadan Jul 20 '18 at 01:41
  • I am on windows. I dont think i am using any of what you mentioned(never heard of those things) but I can't seem to declare any variable mentioning manager as a global variable. – Pookie Jul 20 '18 at 02:45
  • Dunno, me not on Windows. Try [this](https://stackoverflow.com/questions/18204782/runtimeerror-on-windows-trying-python-multiprocessing), see if it helps. – Amadan Jul 20 '18 at 04:00