I'm writing a dict-like MutableMapping
class that performs some additional synchronization and I would like to make it thread safe. I can make most standard operations like __getitem__
and __setitem__
thread-safe by wrapping them in a lock. However, in-place binary operations seemingly cannot be controlled in this fashion because ultimately those operations are performed on the object within the dict, rather than the dict itself. For example, as discussed in this post d['foo'] += 1
ultimately looks something like calls d__setitem__('foo', d.__getitem__('foo').__iadd__(1))
(both calling an in-place add and setting the value since __iadd__
attempts to operate in place but also returns self as documented here).
I assumed that the same limitations would exist for standard dictionaries. To test this behavior, I wrote this simple snippet:
from concurrent.futures import ThreadPoolExecutor
def add_one(d):
d['threadcount'] += 1 # Option 1
# d['threadcount'] = d['threadcount'] + 1 # Option 2
x = {}
x['threadcount'] = 0
total = 10000
with ThreadPoolExecutor(max_workers=100) as executor:
list(executor.map(add_one, [x] * total))
assert x == total
Surprisingly, no matter what I set max_workers
or total
to, this test reliably succeeds for both versions of add_one
shown above. To analyze this further, I looked at the bytecode for the add_one
method:
>>> dis.dis(add_one)
2 0 LOAD_FAST 0 (d)
2 LOAD_CONST 1 ('threadcount')
4 DUP_TOP_TWO
6 BINARY_SUBSCR
8 LOAD_CONST 2 (1)
10 INPLACE_ADD
12 ROT_THREE
14 STORE_SUBSCR
16 LOAD_CONST 0 (None)
18 RETURN_VALUE
As I would expect, this simple method in fact consists of many instructions to account for stack pushes necessary for the dict lookup, the in place addition, etc. Another StackOverflow post provides further explanation that supports my expectation that this operation would not be thread safe. The Python Language Reference explains that the +=
operator is a language construct, so I'm at a loss for how the snippet I posted works as expected regardless. I don't see any way that the dict could acquire the GIL for the entire length of add_one
method since other Python code needs to perform the integer addition I'm requesting, so how is Python avoiding the expected race condition where x['threadcount']
would be read as the same value on many threads before any of them can modify it?
Edit
Specifying in case it is important, since I suspect that what I'm observing is an implementation detail. I'm running CPython version 3.8.5 on a Mac.