Using a global dictionary with threads in Python

Question

Is accessing/changing dictionary values thread-safe?

I have a global dictionary foo and multiple threads with ids id1, id2, ... , idn. Is it OK to access and change foo's values without allocating a lock for it if it's known that each thread will only work with its id-related value, say thread with id1 will only work with foo[id1]?

You **are** using CPython, right? – Esteban Küber Aug 21 '09 at 14:44 — Esteban Küber, Aug 21 '09 at 14:44
@voyager: yes, I'm using CPython. – Alex Aug 21 '09 at 15:30 — Alex, Aug 21 '09 at 15:30

score 82 · Accepted Answer · edited Mar 25 '20 at 18:45

82

Assuming CPython: Yes and no. It is actually safe to fetch/store values from a shared dictionary in the sense that multiple concurrent read/write requests won't corrupt the dictionary. This is due to the global interpreter lock ("GIL") maintained by the implementation. That is:

Thread A running:

a = global_dict["foo"]

Thread B running:

global_dict["bar"] = "hello"

Thread C running:

global_dict["baz"] = "world"

won't corrupt the dictionary, even if all three access attempts happen at the "same" time. The interpreter will serialize them in some undefined way.

However, the results of the following sequence is undefined:

Thread A:

if "foo" not in global_dict:
   global_dict["foo"] = 1

Thread B:

global_dict["foo"] = 2

as the test/set in thread A is not atomic ("time-of-check/time-of-use" race condition). So, it is generally best, if you lock things:

from threading import RLock

lock = RLock()

def thread_A():
    with lock:
        if "foo" not in global_dict:
            global_dict["foo"] = 1

def thread_B():
    with lock:
        global_dict["foo"] = 2

edited Mar 25 '20 at 18:45

Boris Verkhovskiy

14,854
11
100
103

answered Aug 21 '09 at 14:44

Dirk

30,623
8
82
102

Would `global_dict.setdefault("foo", 1)` in `Thread A` make the need for a lock unnecessary? – Claudiu Jun 13 '12 at 15:40
1

Am I understanding this correctly. As long as im adding to the dictionary without modification, it is safe. ie dict['a'] = 1 in thread a and dict['b'] = 2 in thread b is okay because keys a and b are not the same? – Cripto Jul 22 '13 at 11:34
1

@user1048138 -- No. What's safe and what's not depends on your application. Think about a class, which has the fields `a` and `b` and the invariant, that exactly one of those fields is not `None` and the other is `None`. Unless access is properly interlocked, any random combination of `a is [not] None` and `b is [not] None` may be observable in clear violation of the invariant, if only a "naive" getter/setter is used (think: `def set_a(self,a): self.a = a; self.b = None if a is not None else self.b` -- a concurrent thread may observe illegal states during the execution) – Dirk Jul 22 '13 at 13:11
is there a way for me to place the lock on the dictionary datastructure's write/update/delete? – blueberryfields Sep 10 '13 at 21:26
2

@Claudiu: `setdefault` will initialize atomically in CPython if the key is composed entirely of builtins implemented in C. The GIL protects you from races so long as the mutating part of an operation occurs with no byte codes in between beginning the mutation and completing it, and in the case of key insertion, you get that behavior when the `__eq__` and `__hash__` of an object are implemented in C, not Python level code. – ShadowRanger Dec 29 '15 at 21:22
@ShadowRanger - I just asked a question that somebody modded as being a duplicate of this one. What's still not immediately clear to me is if I can depend on `if val == a_dict.setdefault( key, val )` to atomically insert AND know if the value was inserted or already existed. – Brian McFarland Dec 29 '15 at 21:39
@BrianMcFarland: Assuming `val` was unique before the attempted insertion (no other thread could be trying to insert a reference to the exact same `val`), you could use `if val is a_dict.setdefault(key, val):` to perform object identity testing to properly identify whether an update occurred. There are some exceptions to the rule (if `val` is interned `str`, e.g. `str` literal, small `int`, `bool` or the empty `tuple` `()`, it's a singleton; they'd pass object identity testing even if independently "created" in different places). – ShadowRanger Dec 29 '15 at 22:02
@BrianMcFarland: That said, for most scalar values, it's not very useful; the big use case for atomicity with `setdefault` is mutable structures. `a_dict.set_default(key, []).append(val)` is consistent assuming keys are never deleted or assigned new values, only `setdefault`-ed (or read) with the resulting value being mutated in place. – ShadowRanger Dec 29 '15 at 22:06
Is it the case for Pypy? According to the documentation, Pypy also has GIL, so I assume it behaves similarly: https://wiki.python.org/moin/GlobalInterpreterLock – Human Sep 18 '17 at 19:05

score 30 · Answer 2 · answered Aug 21 '09 at 15:36

The best, safest, portable way to have each thread work with independent data is:

import threading
tloc = threading.local()

Now each thread works with a totally independent tloc object even though it's a global name. The thread can get and set attributes on tloc, use tloc.__dict__ if it specifically needs a dictionary, etc.

Thread-local storage for a thread goes away at end of thread; to have threads record their final results, have them put their results, before they terminate, into a common instance of Queue.Queue (which is intrinsically thread-safe). Similarly, initial values for data a thread is to work on could be arguments passed when the thread is started, or be taken from a Queue.

Other half-baked approaches, such as hoping that operations that look atomic are indeed atomic, may happen to work for specific cases in a given version and release of Python, but could easily get broken by upgrades or ports. There's no real reason to risk such issues when a proper, clean, safe architecture is so easy to arrange, portable, handy, and fast.

Thread-local storage is both extreme overkill *and* invites non-trivial complexities (e.g., due to the the need to recombine thread-local results) for simple situations like this. As suggested by saner answers, just: **(A)** globally declare a `dict_lock = threading.Lock()` or `dict_lock = threading.RLock()` and **(B)** wrap each dictionary access in a `with dict_lock:` context manager. — Cecil Curry, Oct 19 '21 at 06:00

yota · Answer 3 · 2015-04-09T09:29:43.857

27

Since I needed something similar, I landed here. I sum up your answers in this short snippet :

#!/usr/bin/env python3

import threading

class ThreadSafeDict(dict) :
    def __init__(self, * p_arg, ** n_arg) :
        dict.__init__(self, * p_arg, ** n_arg)
        self._lock = threading.Lock()

    def __enter__(self) :
        self._lock.acquire()
        return self

    def __exit__(self, type, value, traceback) :
        self._lock.release()

if __name__ == '__main__' :

    u = ThreadSafeDict()
    with u as m :
        m[1] = 'foo'
    print(u)

as such, you can use the with construct to hold the lock while fiddling in your dict()

edited Apr 09 '15 at 09:29

answered Apr 09 '15 at 07:24

yota

2,020
22
37

6

**Obfuscatory boilerplate.** Ideally, a class labelled `ThreadSafeDict` should be an *implicitly* thread-safe dictionary. This isn't; it's just a pointless thin wrapper around `threading.Lock`. Callers still have to manually wrap each dictionary operation in an explicit context manager – which is exactly what callers would do anyway with a direct `threading.Lock`. O_o – Cecil Curry Oct 19 '21 at 06:07
well, I agree for the boilerplate ^_^; you can change the name you find it obfuscatory. But concerning the implicit thread safe dict, I don't know what would be the best way to write the fact that we can do more than one operation under the same lock.... – yota Jan 10 '22 at 13:05

score 4 · Answer 4 · edited Jun 20 '20 at 09:12

4

The GIL takes care of that, if you happen to be using CPython.

global interpreter lock

The lock used by Python threads to assure that only one thread executes in the CPython virtual machine at a time. This simplifies the CPython implementation by assuring that no two processes can access the same memory at the same time. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines. Efforts have been made in the past to create a “free-threaded” interpreter (one which locks shared data at a much finer granularity), but so far none have been successful because performance suffered in the common single-processor case.

See are-locks-unnecessary-in-multi-threaded-python-code-because-of-the-gil.

edited Jun 20 '20 at 09:12

Community

1
1

answered Aug 21 '09 at 14:43

gimel

83,368
10
76
104

That only concerns CPython though. – Bastien Léonard Aug 21 '09 at 14:44
Unless he happens to be using Jython or IronPython. – Esteban Küber Aug 21 '09 at 14:45
@Bastien Léonard: Beat me to it :) – Esteban Küber Aug 21 '09 at 14:45
5

This doesn't mean that you can rely on the GIL. The key could be an instance of a class with a `__hash__` method, so more than 1 Python bytecode instruction is executed and the thread can switch *anyway*. Then there are I/O operations and native code sections that release the GIL. Locks are still very much a requirement for thread-safe code. – Martijn Pieters Dec 29 '15 at 21:17

score 1 · Answer 5 · answered Aug 31 '15 at 04:06

How it works?:

>>> import dis
>>> demo = {}
>>> def set_dict():
...     demo['name'] = 'Jatin Kumar'
...
>>> dis.dis(set_dict)
  2           0 LOAD_CONST               1 ('Jatin Kumar')
              3 LOAD_GLOBAL              0 (demo)
              6 LOAD_CONST               2 ('name')
              9 STORE_SUBSCR
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE

Each of the above instructions is executed with GIL lock hold and STORE_SUBSCR instruction adds/updates the key+value pair in a dictionary. So you see that dictionary update is atomic and hence thread safe.

Using a global dictionary with threads in Python

5 Answers5

Linked

Related