4

I'm wondering if dict.update() is Python thread safe. I've read the related questions, but none of them exactly addresses my question.

My question is very specific and simple. For example, I already have a local dictionary d2. I simply need to update the global dictionary d with d2 as shown below. d starts out empty and fills up with different threads. The d2 in each thread may have overlapping entries with d (don't think this matters). Is it thread safe?

import dis

def f(d):
    d2 = {1:2, 3:4}
    d.update(d2)

print(dis.dis(f))

The bytecode looks like the following:

 10           0 LOAD_CONST               1 (2)
              2 LOAD_CONST               2 (4)
              4 LOAD_CONST               3 ((1, 3))
              6 BUILD_CONST_KEY_MAP      2
              8 STORE_FAST               1 (d2)

 11          10 LOAD_FAST                0 (d)
             12 LOAD_ATTR                0 (update)
             14 LOAD_FAST                1 (d2)
             16 CALL_FUNCTION            1
             18 POP_TOP
             20 LOAD_CONST               0 (None)
             22 RETURN_VALUE

It looks like 16 CALL_FUNCTION is the atomic function that updates the dictionary. So it should be thread safe?

Paolo
  • 20,112
  • 21
  • 72
  • 113
Tim
  • 3,178
  • 1
  • 13
  • 26
  • I don't think it is. See related [How to make built-in containers (sets, dicts, lists) thread safe?](https://stackoverflow.com/questions/13610654/how-to-make-built-in-containers-sets-dicts-lists-thread-safe) – martineau Feb 13 '19 at 23:27
  • 1
    `CALL_FUNCTION` is basically just a goto that remembers where to come back to. It has nothing to do with the atomicity of the function being called. – chepner Feb 14 '19 at 00:21
  • @chepner so what makes atomicity of a call? People say each bytecode instruction is atomic. I never quite understand all this – Tim Feb 14 '19 at 04:31
  • Thread switches occur at (and only at) bytecode boundaries, so anything taking more than one bytecode cannot be atomic without explicit locking. Individual bytecodes may or may not execute atomically, depending on how they're implemented. In your example, all _are_ atomic except for (possibly!) `LOAD_ATTR` and `CALL_FUNCTION`. There is no general principle at work here, only detailed implementation knowledge. More here: https://stackoverflow.com/questions/38266186/is-extending-a-python-list-e-g-l-1-guaranteed-to-be-thread-safe/38320815#38320815 – Tim Peters Feb 14 '19 at 05:55
  • `CALL_FUNCTION` just *starts* the call; you still have to consider the byte code (or its implementation in general) *of* the function itself. – chepner Feb 14 '19 at 12:53
  • So if the function being called is implemented in C and doesn’t use other python code, it should be atomic, right? Are there major downsides to using locks? – Tim Feb 15 '19 at 03:54

2 Answers2

11

If the keys are compositions of builtin hashable types, generally "yes", .update() is thread-safe. In particular, for your example with integers keys, yes.

But in general, no. Looking up a key in a dict can invoke arbitrary user-defined Python code in user-supplied __hash__() and __eq__() methods, and those can do anything at all - including performing their own mutations on the dicts involved. As soon as the implementation invokes Python code, other threads can run too, including threads that may be mutating d1 and/or d2 too.

That's not a potential problem for the builtin hashable types (ints, strings, floats, tuples, ...) - their implementations to compute hash codes and decide equality are purely functional (deterministic and no side effects) and don't release the GIL (global interpreter lock).

That's all about CPython (the C implementation of Python). The answer may differ under other implementations! The Language Reference Manual is silent about this.

Tim Peters
  • 67,464
  • 13
  • 126
  • 132
  • Thanks for the very detailed answer! Yes I'm using CPython and my keys are int/str. – Tim Feb 13 '19 at 23:38
0

You could look into locked-dict if you're ok with using an external library.

From their readme:

Dict to allow context managed thread safe and mutable iterations through a lock.

For example from their tests:

pip install locked-dict

import locked_dict

expected = 0
d = locked_dict.LockedDict()
assert len(d) == expected
assert bool(d) is False
assert d is not True
assert hasattr(d, '_lock')

empty_d = {}
assert d == empty_d

plain_old_d = {999: 'plain old dict', 12345: 54321}
assert d != plain_old_d

with d as m:
    assert len(m) == expected
    assert bool(m) is False
    assert m is not True
    assert hasattr(m, '_lock')
    assert m != plain_old_d
    assert m == empty_d

    m[0] = ['foo']
    expected += 1
    assert len(m) == expected
    assert bool(m) is True
    assert m is not False
    assert m != plain_old_d
    assert m != empty_d

    m.clear()
    expected -= 1
    assert len(m) == expected
    assert bool(m) is False
    assert m is not True
    assert m != plain_old_d
    assert m == empty_d

Take not this library is 3 years old, although it may still be relevent to your use case

Jab
  • 26,853
  • 21
  • 75
  • 114