How do I make Python respect iterable fields when multiprocessing?

Question

Apologies if this is a dumb question, but I've not found an elegant workaround for this issue yet. Basically, when using the concurent.futures module, non-static methods of classes look like they should work fine, I didn't see anything in the docs for the module that would imply they wouldn't work fine, and the module produces no errors when running - and even produces the expected results in many cases!

However, I've noticed that the module seems to not respect updates to iterable fields made in the parent thread, even when those updates occur before starting any child processes. Here's an example of what I mean:

import concurrent.futures


class Thing:
    data_list = [0, 0, 0]
    data_number = 0

    def foo(self, num):
        return sum(self.data_list) * num

    def bar(self, num):
        return num * self.data_number


if __name__ == '__main__':
    thing = Thing()
    thing.data_list[0] = 1
    thing.data_number = 1

    with concurrent.futures.ProcessPoolExecutor() as executor:
        results = executor.map(thing.foo, range(3))
        print('result of changing list:')
        for result in results:
            print(result)

        results = executor.map(thing.bar, range(3))
        print('result of changing number:')
        for result in results:
            print(result)

I would expect the result here to be

result of changing list:
0
1
2
result of changing number:
0
1
2

but instead I get

result of changing list:
0
0
0
result of changing number:
0
1
2

So for some reason, things work as expected for the field that's just an integer, but not at all as expected for the field that's a list. The implication is that the updates made to the list are not respected when the child processes are called, even though the updates to the simpler fields are. I've tried this with dicts as well with the same issue, and I suspect that this is a problem for all iterables.

Is there any way to make this work as expected, allowing for updates to iterable fields to be respected by child processes? It seems bizarre that multiprocessing for non-static methods would be half-implemented like this, but I'm hoping that I'm just missing something!

Is there a reason you are working with class attributes instead of instance attributes? — Klaus D., May 10 '21 at 11:08
Running in debug the list and scalar update outside the class, then the list reverts and the scalar does not inside the class. I will be interested to see an explanation. — jwal, May 10 '21 at 11:36
This might help https://stackoverflow.com/questions/3434581/accessing-a-class-member-variables-in-python/3434596#3434596 — jkr, May 10 '21 at 11:49
@KlausD. I was not aware of the difference between class and instance attributes before now, but that seems to be the fundamental cause of the issue. — Skullsploder, May 10 '21 at 15:03

score 5 · Accepted Answer · answered May 10 '21 at 13:24

The problem has nothing to do with "respecting iterable fields", but it is a rather subtle issue. In your main process you have:

thing.data_list[0] = 1 # first assignment
thing.data_number = 1 # second assignmment

Rather than:

Thing.data_list[0] = 1 # first assignment
Thing.data_number = 1 # second assignment

As far as the first assignment is concerned, there isn't any material difference because with either version you are not modifying a class attribute but rather an element within a list that happens to be referenced by a class attribute. In other words, Thing.data_list is still pointing to the same list; this reference has not been changed. This is an important distinction.

But in the second assignment with your version of the code you have essentially modified a class attribute via the instance's self reference. When you do that, you are creating a new instance attribute with the same name data_number.

Your class members functions foo and bar are attempting to access class attributes via self. The Thing instance, thing will be pickled across to the new address space but in the new address space when the Thing is un-pickled, by default new class attributes will be created and initialized to their default values unless you add special pickle rules. But instance attributes should be successfully transmitted, such as your newly created data_number. And that's why the 'result of changing number:' prints out as you expected, i.e. you are actually accessing the instance attribute data_number in bar.

Change bar to the following and you will see that everything will print out as 0:

    def bar(self, num):
        return num * Thing.data_number

Thanks a bunch, this cleared things up quite a bit. In short, to get the behaviour I expected, I just needed to set the fields within an __init__ method, so that new lists etc. would be generated for each object of the class. Quite a subtle change, but solved the problem completely. I'm guessing the only time one should set attributes in the main body of the class is when they're going to be final? — Skullsploder, May 10 '21 at 14:56
You can modify class attributes in general but don't expect them to be automatically propagated to another address space as in when you are implicitly pickling instances of the class as arguments to a multiprocessing worker function. And if you are setting the values within an `__init__` method, why not just use instance attributes? — Booboo, May 10 '21 at 16:00

How do I make Python respect iterable fields when multiprocessing?

1 Answers1