3

Complete code here: https://gist.github.com/mnjul/82151862f7c9585dcea616a7e2e82033

Environment is Python 2.7.6 on an up-to-date Ubuntu 14.04 x64.

Prologue: Well, I got this strange piece of code at my work project, and it's a classic "somebody wrote it and quit the job, and it works but I don't why" piece, so I decided to write a stripped-down version of it, hoping to get my questions clarified/answered. Please kindly check the referred gist.

Situation: So, I have a custom class Storage inheriting from Python's thread local storage, intended to book-keep some thread-local data. There is only one instance of that class, instantiated in the global scope when no threads have been constructed. So I would expect that as there is only one Storage instance, its __init__() running only once, those Runner threads would actually not have thread-local storage and data accesses will clash.

However this turned out to be wrong and the code output (see my comment at that gist) indicates that each thread actually perfectly has its own local storage --- strangely, at each thread's first access to the storage object (i.e. a set()), Storage.__init__() is mysteriously run, thus properly creating the thread-local storage, producing the desired effect.

Questions: Why on earth did Storage.__init__ get invoked when the threads attempted to call a member function of a seemingly already-instantiated object? Is this a CPython (or PThread, if that matters) implementation detail? I feel like there're a lot of things happening between my stack trace's "py_thr_local.py", line 36, in run => storage.set('keykey', value) and "py_thr_local.py", line 14, in __init__, but I can't find any relevant piece of information in (C)Python's source code, or on the StackOverflow.

Any feedback is welcome. Let me know if I need to clarify things or provide more information.

Mnjul
  • 33
  • 3

2 Answers2

2

That's part of the contract (from http://svn.python.org/projects/python/tags/r27a1/Lib/_threading_local.py):

Note that if you define an init method, it will be called each time the local object is used in a separate thread.

It's not too well documented in the official docs but basically each time you interact with a thread local in a different thread a new instance unique to that thread gets allocated.

Oliver Dain
  • 9,617
  • 3
  • 35
  • 48
2

The first piece of information to consider is what is a thread-local? They are independently initialized instances of a particular type that are tied to a particular thread. With that in mind I would expect that some initialization code would be called multiple times. While in some languages like Java the initialization is more explicit, it does not necessarily need to be.

Let's look at the source for the supertype of the storage container you're using: https://github.com/python/cpython/blob/2.7/Lib/_threading_local.py

Line 186 contains the local type that is being used. Taking a look at that class you can see that the methods setattr and getattribute are among the overridden methods. Remember that in python these methods are called every time you attempt to assign a value or access a value in a type. The implementations of these methods acquire a local lock and then call the _patch method. This patch method creates a new dictionary and assigns it to the current instance dict (using object base to avoid infinite recursion: How is the __getattribute__ method used?)

So when you are calling storage.set(...) you are actually looking up a proxy dictionary in the local thread. If one doesn't exist the the init method is called on your type (see line 182). The result of that lookup is substituted in to the current instances dict method, and then the appropriate method is called on object to retrieve or set that value (l. 193,206,219) which uses the newly installed dict.

Community
  • 1
  • 1
John H
  • 702
  • 3
  • 7
  • Ah yes. So, in my case, as ``Storage`` claims to be the subclass of ``threading.local``, interface-wise it's actually natural to think it should support the behavior of ``threading.local``, so each thread, when accessing the ``storage`` object, is "transparently" allocated the thread-local data storage, just like how we would have expected ``threading.local`` to. I was kinda perplexed by Storage's subclassing of ``threading.local`` for sure. – Mnjul May 27 '16 at 05:34
  • Additionally, since my Python has thread module compiled/linked, my ``threading.local`` implementation is not from _threading_local.py but from the CPython's source code (this may be checked if you add some tracing code at the import statements at https://github.com/python/cpython/blob/2.7/Lib/threading.py#L1196-L1199 ). The CPython counterpart where ``__init__()`` is called is at https://github.com/python/cpython/blob/2.7/Modules/threadmodule.c#L452 at the moment. – Mnjul May 27 '16 at 05:34