3

I'm trying to spawn child processes which return their results via a passed dict argument.

It appears to me that after Process.start() is called that the passed dict is copied in some form because change in one is not reflected in the other. Yet, in both the parent and child process, id() is the same value.

From this article, I would expect that id() returns a unique value for an object. https://www.programiz.com/python-programming/methods/built-in/id

The id() function returns identity of the object. This is an integer which is unique for the given object and remains constant during its lifetime.

import json
from multiprocessing import Process
from time import sleep

def my_format(obj):
    return ('id(obj):' + str(id(obj)) +'; obj:' + json.dumps(obj, indent = 4))

def work(result):
    result['child'] = 'only'
    sleep(5)
    # child does not see entry from parent, must be different object
    # ie missing result['parent'] == 'only'
    print('child thread: ' + my_format(result))
    return

result = {}
p = Process(target = work, args = (result,))

result['both'] = 'see'
p.start() # fork(), which copies the object including its id()
result['parent'] = 'only'
sleep(5)
p.join()
# parent does not see entry from child, must be different object
# ie missing result['child'] == 'only'
print('main thread: ' + my_format(result))

Unexpectedly, the child result and parent result have diverged in content. I.e. changes in one is not reflected in the other.

child thread: id(obj):4385974824; obj:{
    "both": "see",
    "child": "only"
}
main thread: id(obj):4385974824; obj:{
    "both": "see",
    "parent": "only"
}
Razzle Shazl
  • 1,287
  • 1
  • 8
  • 20
  • Where do you have different objects? There is only the `result` dictionary – UnholySheep Jan 17 '19 at 21:01
  • The prints show they have different contents – Razzle Shazl Jan 17 '19 at 21:01
  • 3
    You *modify* the object between `print` calls, but you do not *create* a new object – UnholySheep Jan 17 '19 at 21:02
  • 2
    The `sleep` doesn't matter - you still only have **one** object, so its `id` obviously stays the same – UnholySheep Jan 17 '19 at 21:07
  • How can I get different print results from the same object? – Razzle Shazl Jan 17 '19 at 21:07
  • 2
    The object `result` is mutable. Changing the contents of a dictionary doesn't change its identity. – Fred Larson Jan 17 '19 at 21:08
  • why do you believe the `result` object in the `work()` function is a different object from the one on the last line...? – Rick Jan 17 '19 at 21:10
  • As you quote, "The id() [...] remains constant during its lifetime." The lifetime is between creation and descruction. The *contents* of it may change during the lifetime, but the id() won't. Think of objects like boxes; the same box may have different contents. If you put "post data" in the box, the box has different contents, it's still the same box with the same unchanged id(). – Peteris Jan 17 '19 at 21:13
  • 1
    @Peteris When I modify the contents in child process, the parent process does not see it. And vice-versa. How is this the same object? – Razzle Shazl Jan 17 '19 at 21:14
  • The comments here did not look at the question carefully. Thank you @RickTeachey for getting to the core of the issue. – Razzle Shazl Jan 17 '19 at 21:27

1 Answers1

3

The object- everything about it, INCLUDING THE ID- is copied to the process. This is not the same as deepcopy, which would create a new object. It is the same object copied to another memory space.

See this answer for more information:

python multiprocessing arguments: deep copy?

Rick
  • 43,029
  • 15
  • 76
  • 119
  • Thank you Rick, this must be it! – Razzle Shazl Jan 17 '19 at 21:19
  • There are extra relevant resources in https://kaushikghose.wordpress.com/2016/08/26/python-global-state-multiprocessing-and-other-ways-to-hang-yourself/ and https://stackoverflow.com/questions/11055303/python-multiprocessing-global-variable-updates-not-returned-to-parent/11056415#11056415 . In essence, this seems to be an artifact of how CPython implements multiprocessing, it uses OS fork() to clone the whole python interpreter instance, it's memory space and thus also object IDs. – Peteris Jan 17 '19 at 21:21
  • Thank you for your help @Peteris. And please upvote the question (to bring it out of negative if you are reading this) as I think this is a valid confusion for developers. – Razzle Shazl Jan 17 '19 at 21:24
  • @Peteris I would think, though, that producing an identical copy of the memory space would be a spec for the behavior of `multiprocessing`, no matter how the implementation actually accomplishes it. ie: having the same ids in the new memory space is probably a cross-implementation requirement. – Rick Jan 17 '19 at 21:38
  • @RazzleShazl I respectfully suggest improving the question, rather than asking for upvotes. I can see you have made some modifications with the TL;DR, but it would be better to write the question clearly. See [ask] and how to create a [mcve]. – Peter Wood Jan 18 '19 at 11:35
  • 1
    @PeterWood I appreciate the feedback. I have updated the question. – Razzle Shazl Jan 18 '19 at 17:43
  • @RickTeachey could you clarify the answer a bit? It almost sounds like you're saying whereas `deepcopy` would create a new object, `fork()` does not. When in fact, they are clearly different objects in _both_ cases. – Razzle Shazl Jan 18 '19 at 17:47
  • 1
    @RazzleShazl It depends on the definition of "same", I suppose. If we are in a multiverse, the RickTeachey in the universe next door would be the "same", just in a different universe. This is kind of what is going on here. – Rick Jan 18 '19 at 19:11