1

Here is the sample program for multiprocessing using python. I see the memory usage by each process is ~2 to 3 times higher than the memory each process is supposed to use. If I calculate with just one process, the memory used is ~1.3 times more and it goes higher with the number of processes.

For example, for an array of 1000*1000*1000 with float64, it should use the memory of 8Gb, but I see the memory goes upto 25Gb with 8 processors running in parallel! But I read that multiprocessing uses shared memory. So I am not sure where the memory is leaking. Here is the code :

#To use the code, please take care of your RAM.
#If you have higher RAM, kindly try for the bigger arrays to see the difference clearly.

from numpy import *
import multiprocessing as mp

a = arange(0, 2500, 5)
b = arange(0, 2500, 5)
c = arange(0, 2500, 5)  
a0 = 540. #random values
b0 = 26.
c0 = 826.
def rand_function(a, b, c, a0, b0, c0):
    Nloop = 100.
    def loop(Nloop, out):
        res_total = zeros((500, 500, 500), dtype = 'float') 
        n = 1
        while n <= Nloop:
            rad = sqrt((a-a0)**2 + (b-b0)**2 + (c-c0)**2)
            res_total = res_total + rad
            n +=1 
        out.put(res_total)
    out = mp.Queue() 
    jobs = []
    Nprocs = mp.cpu_count()
    print "No. of processors : ", Nprocs
    for i in range(Nprocs):
        p = mp.Process(target = loop, args=(Nloop/Nprocs, out)) 
        jobs.append(p)
        p.start()

    final_result = zeros((500,500,500), dtype = 'float')

    for i in range(Nprocs):
        final_result = final_result + out.get()

    p.join()
test = rand_function(a,b,c,a0, b0, c0)

Can anyone please tell me where the memory is leaking? And how to overcome that? Thank you very much in advance.

geekygeek
  • 323
  • 1
  • 3
  • 11
  • 1
    How much do you think it is "supposed to use", and why? – Janne Karila Feb 27 '14 at 10:37
  • For example, for an array of 1000*1000*1000, it is supposed to use the memory of (1000*1000*1000*8)/(1024**3) Gb for float64. Right? I also verified by giving a.nbytes, you will know the memory used. – geekygeek Feb 27 '14 at 10:42
  • Just jound this: included into Python 3.4 [tracemalloc](http://docs.python.org/3.4/library/tracemalloc.html) – User Feb 27 '14 at 12:32
  • 1
    There exists a `res_total` array for each process, so with 8 processes you expect to have at least 9 times the size of your data in memory (8 x `res_total` and 1 `final_result`). –  Feb 27 '14 at 20:26
  • @moarningsun , this is what is bothering me. I don't see 9 times increase of memory but only around 2 to 3 times which I was unable to trace out. So you think it is inevitable? – geekygeek Feb 27 '14 at 20:52
  • It's inevitable depending on what you actually want to do in parallel, but by using shared memory as pointed out by Janne Karila you can probably avoid having duplicates fill up the memory. –  Feb 28 '14 at 14:33

2 Answers2

2

Some things that use (a lot of) memory:

  1. res_total
  2. The right-hand side of res_total = res_total + rad creates a temporary array that, for a moment, exist simultaneously with res_total. Using += could avoid that.
  3. out.put(res_total) pickles the array, using roughly the same amount of memory.

That should explain why the memory use can be much higher than 1. alone.

Janne Karila
  • 24,266
  • 6
  • 53
  • 94
  • Thanks! 2. Done. But for 3, is there a way to overcome the pickling of `out.put(u_total)` ?? – geekygeek Feb 27 '14 at 20:32
  • 2
    @geekygeek You could try to avoid big data in the queue and [Use numpy array in shared memory for multiprocessing](http://stackoverflow.com/q/7894791/222914). – Janne Karila Feb 28 '14 at 06:39
  • Thanks! And is there a way to avoid initialization of `res_total` or `final_result` with an array of zeros? As this is also contributing for the increase in the memory. – geekygeek Feb 28 '14 at 07:20
  • @geekygeek I don't think creating the array with `np.zeros` uses more memory than any other way of creating it. – Janne Karila Feb 28 '14 at 07:30
  • Numpy array in shared memory doesn't work with the larger arrays. https://bitbucket.org/cleemesser/numpy-sharedmem/overview He says it works for the arrays which use less than 1GB of memory. You have any comments regarding it? – geekygeek Mar 07 '14 at 09:03
  • @geekygeek No, I haven't tried that approach myself. – Janne Karila Mar 07 '14 at 09:48
1

On an unrelated note, if what you want to do is sum a number of arrays in parallel, it's better to use a multiprocessing.Pool so that you don't have to handle the output value of the loop yourself. Also, your code does not assigns the same task to all the different worker processes, I am not sure if that was intentional or not.

import numpy as np
import multiprocessing as mp


def loop(arg):
    max_n, a, b, c, a0, b0, c0 = arg
    res_total = np.zeros(shape, dtype=np.float)
    print 'starting'
    for _ in range(max_n):
        rad = np.sqrt((a - a0) ** 2 + (b - b0) ** 2 + (c - c0) ** 2)
        res_total = res_total + rad
    print 'done'
    return res_total


def rand_function(a, b, c, a0, b0, c0):
    c_cpu = mp.cpu_count()
    n_loop = 10
    print "No. of processors : ", c_cpu
    pool = mp.Pool(c_cpu)
    out = pool.map(loop, [(n_loop / c_cpu, a, b, c, a0, b0, c0) 
                             for _ in range(c_cpu)])

    print 'collating'
    final_result = np.zeros(shape, dtype='float')
    for i in out:
        final_result += i
    print final_result.shape


shape = (50, 50, 50)
rand_function(np.arange(0, 250, 5), np.arange(0, 250, 5), 
                  np.arange(0, 250, 5), 540, 26, 826)

On my machine each worker process user around a gigabyte of memory. Your original code used around 1.4GB per worker to begin with (and then grew to 2GB). I suspect this has to do with the output queue being modified, which trigger the OS's copy-on-write (I am not sure about that though).

mbatchkarov
  • 15,487
  • 9
  • 60
  • 79
  • I see `multiprocessing.Pool` using more memory than `multiprocessing.Process`. For the smaller arrays, you don't see much differences. If I run the above solution with `multiprocessing.Pool` like you have given for an array of 500*500*500, I see the memory going upto 18Gb. Whereas with `multiprocessing.Process` it goes upto 12Gb, which is still high. – geekygeek Feb 27 '14 at 13:09