6

So what I am trying to do ultimately is read a line, do some calculations with the info in that line, then add the result to some global object, but I can never seem to get it to work. For instance, test is always 0 in the code below. I know this is wrong, and I have tried doing it other ways, but it still isn't working.

import multiprocessing as mp

File = 'HGDP_FinalReport_Forward.txt'
#short_file = open(File)
test = 0

def pro(temp_line):
    global test
    temp_line = temp_line.strip().split()
    test = test + 1
    return len(temp_line)

if __name__ == "__main__":
    with open("HGDP_FinalReport_Forward.txt") as lines:
        pool = mp.Pool(processes = 10)
        t = pool.map(pro,lines.readlines())
RatDon
  • 3,403
  • 8
  • 43
  • 85
user1423020
  • 275
  • 1
  • 4
  • 11
  • 2
    Globals are generally a sign that you are doing something wrong. I advise changing the way your program works to avoid them - it will save you headaches in the long run, and there is always a better way. – Gareth Latty Jun 19 '12 at 21:34
  • The point of the multiprocessing module is that it spawns child processes rather than threads in the same process, with all the usual tradeoffs. Unfortunately, the documentation doesn't explain those tradeoffs at all, assuming you'll already know them. If you follow all of the "Programming guidelines" in the documentation, you may get away with not understanding, but you really should learn. – abarnert Jun 19 '12 at 23:14

2 Answers2

17

The worker processes spawned by the pool get their own copy of the global variable and update that. They don't share memory unless you set that up explicitly. The easiest solution is to communicate the final value of test back to the main process, e.g. via the return value. Something like (untested):

def pro(temp_line):
    test = 0
    temp_line = temp_line.strip().split()
    test = test + 1
    return test, len(temp_line)

if __name__ == "__main__":
    with open("somefile.txt") as lines:
        pool = mp.Pool(processes = 10)
        tests_and_t = pool.map(pro,lines.readlines())
        tests, t = zip(*test_and_t)
        test = sum(tests)
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • 8
    The key thing here is that, using `multiprocessing`, the threads (well, processes) don't share state. – Gareth Latty Jun 19 '12 at 21:36
  • 2
    +1 for the answer, and +1 @Lattyware. I wish the multiprocessing documentation were a little clearer on how "spawning processes using an API similar to the threading module" differs from "creating threads", because that would solve half the problems with the module on SO… – abarnert Jun 19 '12 at 23:15
  • Great stuff! It helped me with updating django models. Apparently the connection isn't forked and can be closed improperly by another process. To take care of that I used this approach but I didn't use zip, I just accessed the tuple elements from the list directly using a for loop, and then for each list item going through the tuple using tuple_element[index]. – radtek Apr 03 '14 at 21:57
0

Here is examples of using global variable within multiprocessing.

We can clearly see that each process works with its own copy of variable:

import multiprocessing
import time
import os
import sys
import random
def worker(a):
    oldValue = get()
    set(random.randint(0, 100))
    sys.stderr.write(' '.join([str(os.getpid()), str(a), 'old:', str(oldValue), 'new:', str(get()), '\n']))

def get():
    global globalVariable
    return globalVariable

globalVariable = -1
def set(v):
    global globalVariable
    globalVariable = v

print get()
set(-2)
print get()

processPool = multiprocessing.Pool(5)
results = processPool.map(worker, range(15))

Output:

27094 0 old: -2 new: 2 
27094 1 old: 2 new: 95 
27094 2 old: 95 new: 20 
27094 3 old: 20 new: 54 
27098 4 old: -2 new: 80 
27098 6 old: 80 new: 62 
27095 5 old: -2 new: 100 
27094 7 old: 54 new: 23 
27098 8 old: 62 new: 67 
27098 10 old: 67 new: 22 
27098 11 old: 22 new: 85 
27095 9 old: 100 new: 32 
27094 12 old: 23 new: 65 
27098 13 old: 85 new: 60 
27095 14 old: 32 new: 71
Bohdan
  • 16,531
  • 16
  • 74
  • 68