25

I have not worked with threading in Python at all and asking this question as a complete stranger.

I am wondering if defaultdict is thread-safe. Let me explain it:

I have

d = defaultdict(list)

which creates a list for missing keys by default. Let's say I have multiple threads started doing this at the same time:

d['key'].append('value')

At the end, I'm supposed to end up with ['value', 'value']. However, if the defaultdict is not thread-safe, if the thread 1 yields to thread 2 after checking if 'key' in dict and before d['key'] = default_factory(), it will cause interleaving, and the other thread will create list in d['key'] and append 'value' maybe.

Then when thread 1 is executing again, it will continue from d['key'] = default_factory() which will destroy the existing list and value, and we will end up in ['key'].

I looked at CPython source code for defaultdict. However, I could not find any locks or mutexes. I guess it is not thread-safe as long as it is documented so.

Some guys last night on IRC said that there is GIL on Python, so it is conceptually thread-safe. Some said threading should not be done in Python. I'm pretty confused. Ideas?

ballade4op52
  • 2,142
  • 5
  • 27
  • 42
ahmet alp balkan
  • 42,679
  • 38
  • 138
  • 214
  • https://groups.google.com/forum/#!topic/comp.lang.python/9ZnBQrYun1w may help –  Jul 16 '13 at 16:52

1 Answers1

33

It is thread safe, in this specific case.

To know why it is important to understand when Python switches threads. CPython only allows switching between threads between Python bytecode steps. This is where the GIL comes in; every N byte code instructions the lock is released and a thread switch can take place.

The d['key'] code is handled by one bytecode (BINARY_SUBSCR) that triggers the .__getitem__() method to be called on the dictionary.

A defaultdict, configured with list as the default value factory, and using string values as keys, handles the dict.__getitem__() method entirely in C, and the GIL is never unlocked, making dict[key] lookups thread safe.

Note the qualification there; if you create a defaultdict instance with a different default-value factory, one that uses Python code (lambda: [1, 2, 3] for example), all bets are off as that means the C code calls back into Python code and the GIL can be released again while executing the bytecode for the lambda function. The same applies to the keys, when using an object that implements either __hash__ or __eq__ in Python code then a thread switch can take place there. Next, if the factory is written in C code that explicitly releases the GIL, a thread switch can take place and thread safety is out the window.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    Since it's [apparently] not in the documentation, this sounds like just a CPython implementation detail -- still useful to know, though. – martineau Jul 16 '13 at 16:57
  • 5
    +1 for mentioning that Python-written factories can trigger release of GIL. Unfortunately, it gets even hairier: GIL can potentially trigger on any `Py_DECREF` if the object is freed and has a `__del__`. This way even pure C code can unwittingly cause a release of the GIL — admittedly pathological, but it can happen. – user4815162342 Jul 16 '13 at 17:27
  • 4
    Pretty interesting that developer should be aware of GIL releases between C/Python code execution back and forth. Thanks. – ahmet alp balkan Jul 16 '13 at 18:52
  • would this `defaultdict(lambda: defaultdict(lambda: False))` usage be thread safe since none of the dict values are list. – Krishna Oza Mar 20 '19 at 15:14
  • 1
    @darth_coder: that depends on the keys, because `hash(key)` can call out to Python if the key implements a Python `__hash__` or `__eq__` method. – Martijn Pieters Mar 20 '19 at 15:29