How can I share a large data structure among forked Python processes?

Question

I'm creating a 58 GB graph in a Python program, which I then want to process through multiple (Linux) processes. For that I'm using a ProcessPoolExecutor with a fork context, passing to each process just an integer associated with the chunk associated with it.

    with concurrent.futures.ProcessPoolExecutor(mp_context=mp.get_context('fork')) as executor:
        futures = []
        for start in range(0, len(vertices_list), BATCH_SIZE):
            future = executor.submit(process_batch, start)
            futures += [future]

The process_batch function calculates a metric of the graph's vertices, and returns a small list of results. Both vertices_list and graph are global variables. The graph variable is bound to a C++ Python extension module that implements the required data structure and calculation.

def process_batch(start):
    results = []
    for v in vertices_list[start:start + BATCH_SIZE]:
        results.append((v, graph.some_metric(v)))
    return results

This works fine with small samples, but when I run it on the 58 GB structure, the machine grinds to a halt, with memory getting swapped out to allow the forked processes to bring in more memory in ¼ MB chunks. (Using strace(1) I saw a few such mmap(2) calls.) I saw two comments (here and here) indicating that on Python fork(2) actually means copy on access, because just accessing an object will change its ref-count. This seems consistent with the behavior I'm observing: the forked processes probably attempting to create their own private copy of the data structure.

So my question is: does Python offer a way to share an arbitrary data structure among forked Python processes? I know about multiprocessing.shared_memory and the corresponding facilities, but these concern the built-in Array and Value types.

Do you even have 58 GB + Overhead of RAM available to Python? Can't you split your problem / data structure into smaller sub problems that can then be worked on individually, similar to a map-reduce!? — luk2302, Jan 28 '23 at 12:45
Yes, the program runs fine as a single process. The computer has 96 GB of RAM. But I don't have access to many more such computers to run separate processes on them, so I'm trying to use its multiple cores. — Diomidis Spinellis, Jan 28 '23 at 12:46
One possible way to do this is to pickle the graph data structure and then send it to each of the forked processes using a message passing library such as ZeroMQ. The processes can then unpickle the graph and process it without needing to create their own private copies. — Pren Ven, Jan 28 '23 at 13:16
@PrenVen: To unpickle the graph and process it they would need to create their own private copies. — Diomidis Spinellis, Jan 28 '23 at 14:15
I ended up rewriting the code in C++ using simply `for_each(execution::par_unseq,`. The program came out at 209 lines against Python's 136 and is 50 times faster than an older unoptimized Python version. — Diomidis Spinellis, Jan 28 '23 at 21:45

How can I share a large data structure among forked Python processes?

0 Answers0