Here is my approximate workflow
import multiprocessing as mp
import pickle
import numpy as np
import psutil
test = [np.random.choice(range(1, 1000), 1000000) for el in range(1,1000)]
step_size = 10**4
for i in range(0,len(test), step_size):
p = mp.Pool(10)
temp_list = test[i:i+step_size]
results = p.map(some_function, temp_list)
gc.collect()
mem = psutil.virtual_memory()
print('Memory available in GB' + str(mem.available/(1024**3)))
with open('file_to_store'+ str(int(i/(step_size))+'.pickle', 'wb') as f:
pickle.dump(results, f)
p.close()
It generates the following output
Memory available in GB36.038265228271484
Memory available in GB23.011260986328125
Memory available in GB9.837642669677734
and then error:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-9-d17700b466c3> in <module>()
260
261 with open(file_path_1+str(int(i/step_size))+'.pickle', 'wb') as f:
--> 262 pickle.dump(vec_1, f)
263 with open(file_path_2+str(int(i/step_size))+'.pickle', 'wb') as f:
264 pickle.dump(vec_2, f)
MemoryError:
some_function
does some minor processing and does not create any global variable that can hang in memory.
I do not understand why amount of available memory is reducing and why it runs out of memory?