Shared-memory complex writable data structures

Question

I have an algorithm that operates on large graph structure that I'd like to make multithreaded for better performance. None of the methods I've looked at quite fit what I want: I would like the graph to exist in shared memory which all of the processes can read and write to (using locks to prevent race conditions). Essentially, I would like something that behaves like OpenMP in C, where all the memory is accessible by each thread.

I started by looking at the threading module, but the GIL means that the performance increase is insignificant.

I proceeded to try the multiprocessing module, as suggested by most of the posts I've found on this topic (e.g. how can I share a dictionary across multiple processes? and Shared-memory objects in python multiprocessing). There are two main problems with this.

First, it seems as though multiprocessing doesn't work well with complicated objects. Consider the following toy problem: I have a list of integers and would like to multiply all of them by 10, then output all the numbers in arbitrary order. I can use the following code:

def multiply_list():
    manager = Manager()
    output = manager.list()
    threads = []

    for v in range(10):
        output.append(v)
    print([str(v) for v in output])

    def process(inputs, start, end):
        while start < end:
            inputs[start] *= 10
            start += 1

    t1 = Process(target=process,
        args = (output, 0, 5))
    t2 = Process(target=process,
        args = (output, 5, 10))
    t1.start()
    t2.start()
    t1.join()
    t2.join()

    print([str(v) for v in output])

with output:

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
['0', '10', '20', '30', '40', '50', '60', '70', '80', '90']

However, if I instead have a list of objects, and modify the objects:

class Container(object):
    def __init__(self, value):
        self.value = value
    def __str__(self):
        return "C" + str(self.value)

def multiply_containers():
    manager = Manager()
    output = manager.list()
    threads = []

    for v in range(10):
        output.append(Container(v))
    print([str(v) for v in output])

    def process(inputs, start, end):
        while start < end:
            inputs[start].value *= 10
            start += 1

    t1 = Process(target=process,
        args = (output, 0, 5))
    t2 = Process(target=process,
        args = (output, 5, 10))
    t1.start()
    t2.start()
    t1.join()
    t2.join()

    print([str(v) for v in output])

There is no change.

['C0', 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'C9']
['C0', 'C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'C9']

Another issue is that the SO post I linked suggested that trying to write to the data structure would make a copy of it, which I don't want.

To clarify the algorithm itself, the first step (building up the graph) works something like this: I have a list of sentences, which are sequences of words. I would like to build a directed graph where each vertex is a word, with out-edges going to each word that follows it in some sentence. For example, if my input is "the cat in the hat" and "the cat in the house", my output graph would be the => cat => in => the => hat, house (that is, "the" has two out-edges, one to "hat" and one to "house"). I also keep track of some auxiliary information, such as how common each sentence or word is. Each vertex has a list of in- and out-edges and some attributes.

I found a module that might work (http://poshmodule.sourceforge.net/posh/html/) but I'm not sure if there's a "canonical" or recommended way to do this sort of thing.

Thanks!

Don't be scared of the GIL and use threads. The overhead of pickling and unpickling the data, which would happen with multiprocessing will probably make it worse and unnecessarily complicated. — Alexandru Chirila, Jan 02 '14 at 23:04
The what? Overhead due to pickling won't even be noticable unless all these process do is communication (which I doubt, i.e. probably input/output relation). Secondly: threads are pointless when dealing with CPU bound tasks. GIL *is* a problem. It would be more efficient to do everything in a single thread. — freakish, Jan 02 '14 at 23:41
possible duplicate of [Sharing object (class instance) in python using Managers](http://stackoverflow.com/questions/11951750/sharing-object-class-instance-in-python-using-managers) (try registering your own manager). — freakish, Jan 02 '14 at 23:43

score 3 · Accepted Answer · edited May 23 '17 at 10:25

Here's sample code (that works) which uses a separate Manager process to control access to the shared data structure and is based on your example code plus that in the question Sharing object (class instance) in python using Managers which @freakish said might be a duplicate question in a comment -- it's not clear to me whether it is or not, but the overall approach seems like it might solve your problem.

from multiprocessing import Lock, Manager, Process
from multiprocessing.managers import BaseManager

class Container(object):
    def __init__(self, value):
        self.value = value
    def __str__(self):
        return "C" + str(self.value)
    def multiply(self, factor):  # added method
        self.value *= factor

def process(inputs, start, end):
    for i in range(start, end):
        inputs.apply(i, 'multiply', (10,))

class ListProxy(object):
    def __init__(self):
        self.nl = []
    def append(self, x):
        self.nl.append(x)
    def __getitem__(self, key):
        return self.nl[key]
    def __iter__(self):
        return iter(self.nl)
    def apply(self, i, method, args, **kwargs):
        getattr(self.nl[i], method)(*args, **kwargs)

class ListManager(BaseManager):
    pass

ListManager.register('ListProxy', ListProxy,
                     exposed=['append', '__getitem__', '__iter__', 'apply'])

def main():
    manager = ListManager()
    manager.start()
    output = manager.ListProxy()

    for v in range(10):
        output.append(Container(v))
    print([str(v) for v in output])

    t1 = Process(target=process, args=(output, 0, 5))
    t2 = Process(target=process, args=(output, 5, 10))
    t1.start()
    t2.start()
    t1.join()
    t2.join()

    print([str(v) for v in output])

if __name__ == '__main__':
    main()

(Oops, submitted before I was ready) Thank you! One concern I have with this is that when I tried running this code with very large lists and many threads, performance doesn't seem to increase much. On a quad-core machine, I see that one process caps out at around one core's worth of CPU utilization, and the rest have low utilization (less and less with more threads). I suspect that the manager is ensuring thread safety by bottlenecking accesses to its objects. Is there any way to get around this? I would like the threads to be completely independent. — uncloakirehut, Jan 03 '14 at 20:21
Access to shared resources in any multitasking/multithreaded environment has to be coordinated and sometimes the overhead of doing so negates any gains from parallel-processing. Sometimes altering the processing algorithm used can make it more amiable to the method. To illustrate: I noticed that in your example code that the processes operate on separate non-overlapping portions of the list -- so perhaps dividing it into 2 or more separate pieces and processing each of them in parallel with no access restrictions and then putting them back together afterwards would be a better approach. — martineau, Jan 03 '14 at 23:17
BTW, the accepted answer to [_Python multiprocessing- sharing a complex object_](http://stackoverflow.com/a/20958036/355230) also offers some good insights along with a good explanation of how this approach works. — martineau, Jan 06 '14 at 21:21
P.S. It also suggests a better way to do things if you're on a "Linux-y" OS. — martineau, Jan 06 '14 at 21:28

Shared-memory complex writable data structures

1 Answers1

Linked