10

I tried to create a new object in a process when using multiprocessing module. However, something confuses me.

When I use multiprocessing module, the id of the new object is the same

for i in range(4):
    p = multiprocessing.Process(target=worker)
    p.start()

def worker():
    # stanford named entity tagger
    st = StanfordNERTagger(model_path,stanford_ner_path)
    print id(st)    # all the processes print the same id

But when I use threading, they are different:

for i in range(4):
    p = threading.Thread(target=worker)
    p.start()

def worker():
    # stanford named entity tagger
    st = StanfordNERTagger(model_path,stanford_ner_path)
    print id(st)    # threads print differnt ids

I am wondering why they are different.

Minwei Shen
  • 118
  • 5

1 Answers1

8

id in CPython returns the pointer of the given object. As threads have shared address space, two different instances of an object will be allocated in two different locations returning two different ids (aka virtual address pointers).

This is not the case for separate processes which own their own address space. By chance, they happen to get the same address pointer.

Keep in mind that address pointers are virtual, therefore they represent an offset within the process address space itself. That's why they are the same.

It is usually better not to rely on id() for distinguishing objects, as new ones might get ids of old ones making hard to track them over time. It usually leads to tricky bugs.

noxdafox
  • 14,439
  • 4
  • 33
  • 45
  • Thank you! Now I got it. – Minwei Shen Nov 14 '15 at 21:21
  • please mark the answer as correct, if so, so that other people can read it – noxdafox Nov 15 '15 at 00:04
  • 1
    Is there any better to get the address (physical) of an object/variable in python? I need to know how exactly memory allocation is happening in python multiprocessing. I know linux fork() is copy-on-write. But it would help me to understand multiprocessing better if I can some how track the object addresses. – Yogesh Yadav Sep 05 '17 at 00:43
  • I am afraid there's no easy way to get the actual physical address of an object in Python. You could wrap some low Linux level functions such as [`virt_to_phys`](http://elixir.free-electrons.com/linux/v3.12/source/arch/x86/include/asm/io.h#L111) in ctypes but I believe it wouldn't be *that* simple. Moreover, understanding high level languages memory allocation by reading physical addresses is probably the most frustrating path you could follow. You would need to deal with OS memory pagination and caching and several other corner cases. – noxdafox Sep 05 '17 at 13:53