How to write to a shared variable in python joblib

Question

The following code parallelizes a for-loop.

import networkx as nx;
import numpy as np;
from joblib import Parallel, delayed;
import multiprocessing;

def core_func(repeat_index, G, numpy_arrary_2D):
  for u in G.nodes():
    numpy_arrary_2D[repeat_index][u] = 2;
  return;

if __name__ == "__main__":
  G = nx.erdos_renyi_graph(100000,0.99);
  nRepeat = 5000;
  numpy_array = np.zeros([nRepeat,G.number_of_nodes()]);
  Parallel(n_jobs=4)(delayed(core_func)(repeat_index, G, numpy_array) for repeat_index in range(nRepeat));
  print(np.mean(numpy_array));

As can be seen, the expected value to be printed is 2. However, when I run my code on a cluster (multi-core, shared memory), it returns 0.0.

I think the problem is that each worker creates its own copy of the numpy_array object, and the one created in the main function is not updated. How can I modify the code such that the numpy array numpy_array can be updated?

So, have you decided on the answers? ;-) – Sergey Vasilyev Oct 24 '17 at 20:08 — Sergey Vasilyev, Oct 24 '17 at 20:08

score 26 · Accepted Answer · edited Jun 19 '19 at 13:43

joblib uses the multiprocessing pool of processes by default, as its manual says:

Under the hood, the Parallel object create a multiprocessing pool that forks the Python interpreter in multiple processes to execute each of the items of the list. The delayed function is a simple trick to be able to create a tuple (function, args, kwargs) with a function-call syntax.

Which means, that every process inherits the original state of the array, but whatever it writes inside into it, is lost when the process exits. Only the function result is delivered back to the calling (main) process. But you do not return anything, so None is returned.

To make the shared array modiyable, you have two ways: using threads and using the shared memory.

The threads, unlike the processes, share the memory. So you can write to the array and every job will see this change. According to the joblib manual, it is done this way:

  Parallel(n_jobs=4, backend="threading")(delayed(core_func)(repeat_index, G, numpy_array) for repeat_index in range(nRepeat));

When you run it:

$ python r1.py 
2.0

However, when you will be writing complex things into the array, make sure you properly handle the locks around the data or data pieces, or you will hit the race conditions (google it).

Also read carefully about GIL, as the computational multithreading in Python is limited (unlike the I/O multithreading).

If you still need the processes (e.g. because of GIL), you can put that array into the shared memory.

This is a bit more complicated topic, but joblib + numpy shared memory example is shown in the joblib manual also.

score 2 · Answer 2 · answered Oct 23 '17 at 07:29

As Sergey wrote in his answer, processes doesn't share state and memory. This is why you don't see the expected answer.

Threads share state and memory space, as they run under the same process. This is useful if you have many I/O operations. It won't get you more processing power (more CPUs) because of the GIL

One technique to communicate between processes is Proxy Objects using Manager. You create a manager object, which synchronize resources between the processes.

A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies.

I haven't tested this code (I don't have all the modules you use), and it might require more modifications to the code, but using Manager object it should look like this

if __name__ == "__main__":
    G = nx.erdos_renyi_graph(100000,0.99);
    nRepeat = 5000;

    manager = multiprocessing.Manager()
    numpys = manager.list(np.zeros([nRepeat, G.number_of_nodes()])

    Parallel(n_jobs=4)(delayed(core_func)(repeat_index, G, numpys, que) for repeat_index in range(nRepeat));
    print(np.mean(numpys));

The data structure there is semantically a list of lists of floats (a matrix/table), but actually is an instance of `numpy.array` of `numpy.array`s of `numpy.float64` values. You will have a lot of trouble syncing these custom data types via the default manager, which supports only few scalar values, native lists & dicts. — Sergey Vasilyev, Oct 23 '17 at 17:39

How to write to a shared variable in python joblib

2 Answers2

Linked