1

Similar to this question How to share a variable in 'joblib' Python library

I want to share a variable in joblib. However, my problem is completely different, I have a huge variable (2-3Gb of RAM) and I want all my threads to read from it. They will never write, something like:

def func(varThatChange, varToRead):
    # Do something over varToRead depending on varThatChange
    return results

def main():
    results = Parallel(n_jobs=100)(delayed(func)(varThatChange, varToRead) for varThatChange in listVars) 

I cannot share it normally because it needs a lot of time to copy the variable, moreover, I go out of memory.

How can I share it?

Community
  • 1
  • 1
user1753235
  • 199
  • 12

1 Answers1

0

if your data/variable can be indexed you can use an approach like that:

from joblib import Parallel, delayed
import numpy as np

# dummy data
big_data = np.arange(1000)
# size of the data
data_size = len(big_data)
# number of chunks the data should be divided in for multiprocessing
num_chunks = 12
# size of one chunk
chunk_size = int(data_size / num_chunks)
# get the indices of the chunks
chunk_ind = [[i, i + chunk_size] for i in range(0, data_size, chunk_size)]

# function that does the data processing
def processing_func(segment):
    # do the data processing
    x = big_data[segment[0] : segment[-1]] * 1
    return x

# results of the parallel processing - one list per chunk
parallel_results = Parallel(n_jobs=10)(delayed(processing_func)(i) for i in chunk_ind)