I wrote a script that implements multiprocessing of tasks, and these tasks require to have access to a very big dictionary (about 20G of memory; only read, no modification).
My script works perfectly fine but the RAM memory usage is huge when running my script on a 8 CPUs server. I believe that is due to the fact that this dictionary is set to 'global' (so that all processes have access to it), this dictionary being copied in each process (8 x 20 -> 160G).
Is there a way to put this dictionary in a memory shared by all processes without making x copies of it?
I'm using Python 3.7 and a simplified version of my code looks like this:
from multiprocessing import Pool as ThreadPool
def function_1(filename):
# read the file and do something with the data depending on the info stored in dict d
# return some new data
global d
# fill dict d with a lot of info
list_of_files = [file_name1, file_name2, file_name3, ... , file_name_876]
pool = ThreadPool(8)
mp_res = pool.map(function_1, list_of_files, chunksize=1)
pool.close()
pool.join()