2

I'm testing some code(trying to make it faster but also trying to understand the differences). I have a loop that creates a table in memory. I then tried to multiprocess it but when I multiprocess the memory usage seems weird. When I run it on its own the table keeps growing and growing until it takes all the memory on the system but when I use multiprocessing it stays low the whole time, which makes me question what its doing. I'm trying to quickly recreate the unmultiprocessed code.

Here's some code(just add/remove items from the data variable to make it run faster or slower to see the system process. Multiprocessed is at the top and the nonmulti is at the bottom):

from multiprocessing import Pool
from multiprocessing.managers import BaseManager, DictProxy
from collections import defaultdict

class MyManager(BaseManager):
    pass

MyManager.register('defaultdict', defaultdict, DictProxy)

def test(i,x, T):
    target_sum = 1000
    # T[x, i] is True if 'x' can be solved
    # by a linear combination of data[:i+1]
    #T = defaultdict(bool)           # all values are False by default
    T[0, 0] = True                # base case

    for s in range(target_sum + 1): #set the range of one higher than sum to include sum itself
            #print s
            for c in range(s / x + 1):  
                if T[s - c * x, i]:
                    T[s, i + 1] = True


data = [2,5,8,10,12,50]                
pool = Pool(processes=2)
mgr = MyManager()
mgr.start()
T = mgr.defaultdict(bool)
T[0, 0] = True 
for i, x in enumerate(data):    # i is index, x is data[i]
    pool.apply_async(test, (i,x, T))
pool.close()
pool.join()
pool.terminate()


print 'size of Table(with multiprocesing) is:', len(T)
count_of_true = []
for x in T.items():
    if T[x] == True:
       count_of_true.append(x)
print 'total number of true(with multiprocesing) is ', len(count_of_true)


#now lets try without multiprocessing
target_sum = 100
# T[x, i] is True if 'x' can be solved
# by a linear combination of data[:i+1]
T1 = defaultdict(bool)           # all values are False by default
T1[0, 0] = True                # base case


for i, x in enumerate(data):    # i is index, x is data[i]
    for s in range(target_sum + 1): #set the range of one higher than sum to include sum itself
            for c in range(s / x + 1):  
                if T1[s - c * x, i]:
                    T1[s, i + 1] = True

print 'size of Table(without multiprocesing) is ', len(T1)

count = []
for x in T1:
    if T1[x] == True:
        count.append(x)

print 'total number of true(without multiprocessing) is ', len(count)

As an experiment, I put both pieces of code into a two files and ran them side by side. two multi's take about 20% and each use only 0.5% of memory. The single process(without multi) is using 75% of a core and up to 50% memory usage.

Lostsoul
  • 25,013
  • 48
  • 144
  • 239
  • you write: "When I run it on its own..." Do you talk about setting Pool(processes=1) ? – itsafire Feb 21 '12 at 15:21
  • Not exactly. in my code above I have two sections, one is wrapped in a multiprocess pool and the other runs on its own(without the pool). – Lostsoul Feb 21 '12 at 15:23

1 Answers1

2

If I understood your code right the real problem is that you can't build your lookup table with multiprocessing.

This:

for i, x in enumerate(data):
    for s in range(target_sum + 1):
        for c in range(s / x + 1):  
            if T1[s - c * x, i]:
                T1[s, i + 1] = True

works because you're doing itone step after the other.

While this:

def test(i,x, T):
    target_sum = 1000
    T[0, 0] = True
    for s in range(target_sum + 1):
        for c in range(s / x + 1):  
            if T[s - c * x, i]:
                T[s, i + 1] = True

# [...]

for i, x in enumerate(data):
    pool.apply_async(test, (i,x, T))

Will not do the same thing, beacuse you need your previous results in order to build your new ones, as in RecursivelyListAllThatWork().

There's also a bug in your counting, this:

for x in T.items():
    if T[x] == True:
       count_of_true.append(x)

Should be:

for x in T:
    if T[x] == True:
       count_of_true.append(x)

And it's better to compare True with is not with ==, even though in your case you don't need that either:

for x in T:
    if T[x]:
       count_of_true.append(x)

As a side note also, you don't actually need a defaultdict here, as I and others have already told you.

Community
  • 1
  • 1
Rik Poggi
  • 28,332
  • 6
  • 65
  • 82
  • but even with process=1, it doesn't max out memory like it does completely without multiprocessing. I notice the same thing with another multi-process I created(single process spikes memory while multi does not seem to grow past a single point as the work gets larger). – Lostsoul Feb 21 '12 at 15:25
  • @Lostsoul: I don't think that matters. The problem is that you are doing all the table over and over with the for-loop. I've run your code and the multiprocessing table is almost 10 times bigger than without. About memory usage that might be because you were/are using one shared table instead of multiple ones, but is not the case presented here. – Rik Poggi Feb 21 '12 at 15:36
  • But shouldn't multiprocessing still have one shared table(like without multi) or is it working on its own version of the table and then syncing it to the master or something? Sorry I just don't understand the memory differences still. Also, I read your updated answer, thanks for the tips(Your helping me become better). – Lostsoul Feb 21 '12 at 15:52
  • @Lostsoul: When you talked about memory issue I thought you were meaning you table that have different dimensions with or without multiprocessing. Anyway I tried to run again the code with my system manager opened *(not a very reliable test)* and the memory usage have barely seen it. – Rik Poggi Feb 21 '12 at 16:20
  • when I add about 3000 items to the list, it starts showing as the single process keeps growing but the multi does not grow at all. – Lostsoul Feb 21 '12 at 16:35
  • The number of `True` values in `T` can be counted with `sum(T.values())`. – martineau Feb 22 '12 at 11:07
  • @martineau: Actually if `T` was a `dict` and with that algorithm it could just be `len(T)`. – Rik Poggi Feb 22 '12 at 13:40
  • 1
    @Rik: True...and yet another reason to use a `dict()` instead of a `defaultdict(bool)` -- actually, now that I think about it, a `set` would probably be even better. However, given the fundamental issue that filling the table can't be done concurrently renders the point irrelevant I suppose... – martineau Feb 22 '12 at 17:57
  • @Lostsoul: Please don't get offended, but you are falling in the [mistake](http://blogs.msdn.com/b/oldnewthing/archive/2011/11/04/10233798.aspx) of braking your problem in two parts: one easy and one impossible (or very very hard), also you're asking about how the easy part could be done. You should ask a question about your real problem (that seems to be on how to improve the building of your lookup table). – Rik Poggi Feb 22 '12 at 19:21
  • Your totally right, I might ask how to improve the building of the table. To me, I realize the multi-threading approach was flawed but because it was faster than single threading I run loops again and again until the number of results stabilizes which is why I thought if I could somehow increase the reading and sharing of the lookup table then I could solve it easier. – Lostsoul Feb 22 '12 at 20:27
  • 1
    @Lostsoul: If the approach is fundamentally flawed, then it doesn't matter how fast or efficient it is because it's hardly better than any other one that doesn't work, including doing nothing at all -- which of course is the fastest (incorrect) method of all. – martineau Feb 23 '12 at 17:48