0

I wrote a script that implements multiprocessing of tasks, and these tasks require to have access to a very big dictionary (about 20G of memory; only read, no modification).

My script works perfectly fine but the RAM memory usage is huge when running my script on a 8 CPUs server. I believe that is due to the fact that this dictionary is set to 'global' (so that all processes have access to it), this dictionary being copied in each process (8 x 20 -> 160G).

Is there a way to put this dictionary in a memory shared by all processes without making x copies of it?

I'm using Python 3.7 and a simplified version of my code looks like this:

from multiprocessing import Pool as ThreadPool 

def function_1(filename):
   # read the file and do something with the data depending on the info stored in dict d
   # return some new data

global d
# fill dict d with a lot of info

list_of_files = [file_name1, file_name2, file_name3, ... , file_name_876]

pool = ThreadPool(8) 
mp_res = pool.map(function_1, list_of_files, chunksize=1)
pool.close() 
pool.join()      
Romain
  • 137
  • 2
  • 9
  • You could look at multiprocessing [Manager](https://stackoverflow.com/questions/22487296/multiprocessing-in-python-sharing-large-object-e-g-pandas-dataframe-between) – quamrana Apr 24 '20 at 09:53
  • Why are you calling `multiprocessing.Pool` `ThreadPool`, that's misleading. – juanpa.arrivillaga Apr 24 '20 at 10:07

0 Answers0