0

I'm trying to alter a dictionary in python inside a process pool environment, but the dictionary isn't changed when the pool finishes.
Here's a minimal example of the problem (the output batch_input is all zeros, although inside per_batch_build it changes the relevant values)

from multiprocessing import Pool, freeze_support
import numpy as np
import itertools

def test_process():
    batch_size = 2
    batch_input = {'part_evecs': np.zeros((2, 10, 10)),
                   'model_evecs': np.zeros((2, 10, 10)),
                   }

    batch_model_dist = np.zeros((2, 10, 10))

    pool = Pool(4)
    batch_output = pool.map(per_batch_build, itertools.izip(itertools.repeat(batch_input),
                                                            itertools.repeat(batch_model_dist),
                                                            list(range(batch_size))))
    pool.close()
    pool.join()

    return batch_input, batch_model_dist


# @profile
# def per_batch_build(batch_input, batch_model_dist, batch_part_dist, dataset, i_batch):
def per_batch_build(tuple_input):
    batch_input, batch_model_dist, i_batch = tuple_input

    batch_model_dist[i_batch] = np.ones((10,10))

    batch_input['part_evecs'][i_batch] = np.ones((10,10))
    batch_input['model_evecs'][i_batch] = np.ones((10,10))

But unfortunately batch_input, batch_model_dist, batch_part_dist are all zeros, although when printing batch_input inside per_batch_build is not zero.

Using the solutions provided from previous discussions, the result stays the same (the output arrays are all zeros)

from multiprocessing import Pool, freeze_support, Manager, Array
import numpy as np
import itertools
import ctypes

def test_process():
    manager = Manager()

    shared_array_base = Array(ctypes.c_double, [0] * (2*10*10))
    shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
    shared_array = shared_array.reshape((2,10,10))

    batch_size = 2
    batch_input = manager.dict({'part_evecs': shared_array,
                               # 'model_evecs': np.zeros((2, 10, 10)),
                               })


    batch_model_dist = np.zeros((2, 10, 10))

    pool = Pool(4)
    batch_output = pool.map(per_batch_build, itertools.izip(itertools.repeat(batch_input),
                                                            itertools.repeat(batch_model_dist),
                                                            list(range(batch_size))))
    pool.close()
    pool.join()

    return batch_input, batch_model_dist


# @profile
# def per_batch_build(batch_input, batch_model_dist, batch_part_dist, dataset, i_batch):
def per_batch_build(tuple_input):
    batch_input, batch_model_dist, i_batch = tuple_input

    batch_model_dist[i_batch] = np.ones((10,10))

    batch_input['part_evecs'][i_batch] = np.ones((10,10))
    # batch_input['model_evecs'][i_batch] = np.ones((10,10))
DsCpp
  • 2,259
  • 3
  • 18
  • 46

1 Answers1

0

You are changing a copy of the object created inside per_batch_build. You are naming them identically in both functions so it may be confusing.

Add print(id(batch_model_dist)) inside both functions and see for yourself.

[Edit] I should probably also link related response, for example:

Is shared readonly data copied to different processes for multiprocessing?

Adam Owczarczyk
  • 2,802
  • 1
  • 16
  • 21
  • Thank you very much on your response, as batch_input is a dictionary, and the answer you provided is ndarray, how can I accomplish it here? or does the dictionary is not copied? – DsCpp May 27 '19 at 15:13
  • I'm not an expert on multiprocessing, but use search function, I found an answer here: https://stackoverflow.com/questions/6832554/multiprocessing-how-do-i-share-a-dict-among-multiple-processes You may also want to use a specialized library that will do everything for you, like dask. – Adam Owczarczyk May 27 '19 at 15:18