I'm testing some code(trying to make it faster but also trying to understand the differences). I have a loop that creates a table in memory. I then tried to multiprocess it but when I multiprocess the memory usage seems weird. When I run it on its own the table keeps growing and growing until it takes all the memory on the system but when I use multiprocessing it stays low the whole time, which makes me question what its doing. I'm trying to quickly recreate the unmultiprocessed code.
Here's some code(just add/remove items from the data variable to make it run faster or slower to see the system process. Multiprocessed is at the top and the nonmulti is at the bottom):
from multiprocessing import Pool
from multiprocessing.managers import BaseManager, DictProxy
from collections import defaultdict
class MyManager(BaseManager):
pass
MyManager.register('defaultdict', defaultdict, DictProxy)
def test(i,x, T):
target_sum = 1000
# T[x, i] is True if 'x' can be solved
# by a linear combination of data[:i+1]
#T = defaultdict(bool) # all values are False by default
T[0, 0] = True # base case
for s in range(target_sum + 1): #set the range of one higher than sum to include sum itself
#print s
for c in range(s / x + 1):
if T[s - c * x, i]:
T[s, i + 1] = True
data = [2,5,8,10,12,50]
pool = Pool(processes=2)
mgr = MyManager()
mgr.start()
T = mgr.defaultdict(bool)
T[0, 0] = True
for i, x in enumerate(data): # i is index, x is data[i]
pool.apply_async(test, (i,x, T))
pool.close()
pool.join()
pool.terminate()
print 'size of Table(with multiprocesing) is:', len(T)
count_of_true = []
for x in T.items():
if T[x] == True:
count_of_true.append(x)
print 'total number of true(with multiprocesing) is ', len(count_of_true)
#now lets try without multiprocessing
target_sum = 100
# T[x, i] is True if 'x' can be solved
# by a linear combination of data[:i+1]
T1 = defaultdict(bool) # all values are False by default
T1[0, 0] = True # base case
for i, x in enumerate(data): # i is index, x is data[i]
for s in range(target_sum + 1): #set the range of one higher than sum to include sum itself
for c in range(s / x + 1):
if T1[s - c * x, i]:
T1[s, i + 1] = True
print 'size of Table(without multiprocesing) is ', len(T1)
count = []
for x in T1:
if T1[x] == True:
count.append(x)
print 'total number of true(without multiprocessing) is ', len(count)
As an experiment, I put both pieces of code into a two files and ran them side by side. two multi's take about 20% and each use only 0.5% of memory. The single process(without multi) is using 75% of a core and up to 50% memory usage.