2

Here is my approximate workflow

import multiprocessing as mp
import pickle
import numpy as np
import psutil

test = [np.random.choice(range(1, 1000), 1000000) for el in range(1,1000)]
step_size = 10**4
for i in range(0,len(test), step_size):
    p = mp.Pool(10)
    temp_list = test[i:i+step_size]
    results = p.map(some_function, temp_list)

    gc.collect()
    mem = psutil.virtual_memory()
    print('Memory available in GB' + str(mem.available/(1024**3)))

    with open('file_to_store'+ str(int(i/(step_size))+'.pickle', 'wb') as f:
            pickle.dump(results, f)
    p.close()

It generates the following output

Memory available in GB36.038265228271484
Memory available in GB23.011260986328125
Memory available in GB9.837642669677734

and then error:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-9-d17700b466c3> in <module>()
    260 
    261         with open(file_path_1+str(int(i/step_size))+'.pickle', 'wb') as f:
--> 262             pickle.dump(vec_1, f)
    263         with open(file_path_2+str(int(i/step_size))+'.pickle', 'wb') as f:
    264             pickle.dump(vec_2, f)

MemoryError: 

some_function does some minor processing and does not create any global variable that can hang in memory.

I do not understand why amount of available memory is reducing and why it runs out of memory?

user1700890
  • 7,144
  • 18
  • 87
  • 183
  • 1
    Have you checked to see that the processes are actually terminating? `close()` on its own doesn't wait for the process to terminate; you'd have to call `join()` as well. Or, better yet, if you're using a more recent version of Python, let the Pool manage itself using `with mp.Pool(10) as p`:`. – senderle Jan 08 '18 at 15:59
  • You're creating a new pool in every iteration of your `for` loop. Why? – roganjosh Jan 08 '18 at 15:59
  • 1
    @roganjosh I've done the same thing at times, having found that trying to reuse pool processes for memory-intensive tasks leads to deadlocks. It may not be the right thing to do, but it's expedient for quick projects. – senderle Jan 08 '18 at 16:02
  • And yet, your issue is in memory. – roganjosh Jan 08 '18 at 16:06
  • @senderle, I am trying to search how to use `p.join()` properly, but simply inserting before `p.close()` generated `AssertionError` with pointer on `p.join()` – user1700890 Jan 08 '18 at 16:08
  • If the processing part is minor then I'm not sure what you're trying to do with multiprocessing in the first place. There's overhead in spawning a pool, so doing it on each iteration is expensive. – roganjosh Jan 08 '18 at 16:10
  • @roganjosh It is minor in the sense that it is simple, but it is computationally intensive so I have to use multiprocessing. It takes parsed sentence end embeds it into R^n (both tokens and tags) – user1700890 Jan 08 '18 at 16:12
  • 1
    You would call `close()` *before* `join()`. Close stops sending tasks to the processes, `join()` actually waits for them to terminate. Also, `terminate()` tells the processes to stop without completing remaining tasks -- you might need to do that instead of `close()` at first, if there's some kind of error that's preventing a process from terminating on its own. – senderle Jan 08 '18 at 16:28
  • 1
    So to reiterate, `join()` is the *final* thing to call. – senderle Jan 08 '18 at 16:29
  • @senderle, even with `with mp.Pool(10) as p:` memory keeps dropping and then generates the same `MemoryError` – user1700890 Jan 08 '18 at 16:30
  • @senderle putting `p.terminate()` and then `p.join()` produced the same result. I guess I need to monitor explicitly what process/variable grows in memory. – user1700890 Jan 08 '18 at 16:51
  • 1
    Well -- I wonder if it could be a problem with the way `pickle` is working... have you looked at [this](https://stackoverflow.com/questions/17513036/pickle-dump-huge-file-without-memory-error)? It might not be a multiprocessing issue at all. – senderle Jan 08 '18 at 17:21
  • 1
    @senderle, Surprisingly adding `del temp_list` and `del results` after pickling helped. There is so much of Python that I do not understant – user1700890 Jan 08 '18 at 18:02
  • Came back to this -- interesting that explicitly deleting objects was necessary! You could post this as a self-answer -- I'm sure people would find it helpful. – senderle Jun 28 '18 at 03:26

0 Answers0