2

I would like to parallelize a process in python which needs read access to several large, non-array data structures. What would be a recommended way to do this without copying all of the large data structures into every new process?

Thank you

user77463
  • 25
  • 3
  • [multiprocessing - sharing a complex object](https://stackoverflow.com/questions/20955683/multiprocessing-sharing-a-complex-object) may help. – martineau Oct 15 '20 at 16:11

1 Answers1

-1

The multiprocessing package provides two ways of sharing state: shared memory objects and server process managers. You should use server process managers as they support arbitrary object types.

The following program makes use of a server process manager:

#!/usr/bin/env python3

from multiprocessing import Process, Manager

# Simple data structure
class DataStruct:
    data_id = None
    data_str = None
    
    def __init__(self, data_id, data_str):
        self.data_id = data_id
        self.data_str = data_str

    def __str__(self):
        return f"{self.data_str} has ID {self.data_id}"

    def __repr__(self):
        return f"({self.data_id}, {self.data_str})"

    def set_data_id(self, data_id):
        self.data_id = data_id

    def set_data_str(self, data_str):
        self.data_str = data_str

    def get_data_id(self):
        return self.data_id 

    def get_data_str(self):
        return self.data_str


# Create function to manipulate data
def manipulate_data_structs(data_structs, find_str):
    for ds in data_structs:
        if ds.get_data_str() == find_str:
            print(ds)

# Create manager context, modify the data
with Manager() as manager:

    # List of DataStruct objects
    l = manager.list([
        DataStruct(32, "Andrea"),
        DataStruct(45, "Bill"),
        DataStruct(21, "Claire"),
    ])

    # Processes that look for DataStructs with a given String
    procs = [
        Process(target = manipulate_data_structs, args = (l, "Andrea")),
        Process(target = manipulate_data_structs, args = (l, "Claire")),
        Process(target = manipulate_data_structs, args = (l, "David")),
    ]

    for proc in procs:
        proc.start()

    for proc in procs:
        proc.join()

For more information, see Sharing state between processes in the documentation.

Ken
  • 443
  • 4
  • 8
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/low-quality-posts/27397588) – Akhilesh Mishra Oct 15 '20 at 17:29
  • Good point, I’ll try to get a simple example together. – Ken Oct 15 '20 at 17:40