3

Is checking if a key is in a dictionary - if key in mydict - an atomic operation?

If not, would there be any negative effects if a thread was checking for a key while another thread was modifying the dictionary? The checking thread doesn't modify the dictionary, it just behaves differently depending on the presence of a key.

Tim
  • 11,710
  • 4
  • 42
  • 43
  • 2
    possible duplicate: http://stackoverflow.com/questions/3358770/python-dictionary-is-thread-safe – Daniel Figueroa Dec 29 '12 at 00:57
  • You could check this yourself in far less time than it would take for anyone to answer. – Pulpo Dec 29 '12 at 00:57
  • I'm pretty sure that most built-in methods of dictionaries are atomic thanks to the Global Interpreter Lock in CPython, but that's an implementation detail, not a language guarantee. – Blckknght Dec 29 '12 at 01:00
  • @DanielFigueroa: Unless I'm missing it, that doesn't mention behaviour if one thread tries to check for the presence of a key while another thread is modifying the dictionary. – Tim Dec 29 '12 at 01:02
  • 1
    @Tim Actually, it does (see Alex Martelli's answer below the accepted one), though it's only a small part. Still, it may be worth an independent question. –  Dec 29 '12 at 01:05
  • 5
    @John: how exactly would you "check this yourself"? The kinds of race conditions being discussed here are very difficult to reproduce. – Ned Batchelder Dec 29 '12 at 01:29
  • @NedBatchelder: You could write some C extension class that releases the GIL and blocks for a few hundred ms and logs like made in its implementations of the hash and eq slots, which would make the race conditions a lot easier to reproduce… – abarnert Dec 29 '12 at 02:14
  • state is the root of all evil. – Paulo Scardine Dec 29 '12 at 02:19
  • @PauloScardine: And that's why we all program in Haskell (and avoid evaluating IO inside main) instead of Python, right? :) But seriously, if you're saying that this is yet another good argument for why you should do `d = filteredCopyOf(d)` instead of `filterInPlace(d)`, I agree. – abarnert Dec 29 '12 at 02:34
  • @abarnet: IMHO python sucks at multithreading in general - for concurrency, the "task queue" approach (using celery or redis) seems to be away more popular than classic multithreading. – Paulo Scardine Dec 29 '12 at 02:40
  • @PauloScardine: You're arguing two almost-contradictory things at the same time. First, shared-state multithreading is itself bad, because state is hard to reason about. Agreed, but Python isn't worse than any other language there—in fact, because of the GIL, it's actually better than most. On the other hand, when you _do_ want to share state and reason things down to the edge of every race (which usually means you're misleading yourself, but not _always_), that's where Python sucks, and again it's because of the GIL. – abarnert Dec 29 '12 at 03:01
  • @abarnert: the first time multithreading made a python program of mine perform worst than the single threaded version, I realized multithreading is somewhat useless for CPU bound tasks in Python, and for I/O bound tasks there are several interesting approaches. – Paulo Scardine Dec 29 '12 at 03:17
  • @PauloScardine: Of course, there are many interesting approaches to concurrency for non-CPU-bound apps. That's exactly as true in Python as in any other language. One of those approaches is multithreading—for which Python is slightly better than most other imperative languages in this case, not worse. So, it's silly to say that it "sucks at multithreading" when talking about non-CPU-bound cases when the opposite it true. For CPU-bound cases, it _does_ suck, but the approaches you're talking about aren't relevant there, so it's a non-sequitur. – abarnert Dec 29 '12 at 07:32

1 Answers1

4

I don't think "atomic" is really what you're interested in.

If you're not using CPython (or you are, but one of the threads is running C code that's not under the GIL…), it is definitely not atomic, but it may be safe anyway under certain conditions.

If you are using CPython, it's atomic in the sense that in is a single bytecode op (COMPARE_OP 6), and in the possibly more useful sense that the actual hash-table lookup itself definitely happens under the GIL, and any potential equality comparison definitely happens with an object that's guaranteed to be alive. But it may still be unsafe, except under certain conditions.

First, the higher-level operation you're doing here is inherently racy. If thread 1 can be doing d['foo'] = 3 or del d['foo'] at the same time thread 0 is calling 'foo' in d, there is no right answer. It's not a question of atomic or not—there is no sequencing here.

But if you do have some kind of explicit sequencing at the application level, so there is a right answer to get, then you're only guaranteed to get the right answer if both threads hold the GIL. And I think that's what you're asking about, yes?

That's only even possible in CPython—and, even there, it amounts to guaranteeing that no object you ever put in a dict could ever release the GIL when you try to hash or == it, which is a hard thing to guarantee generally.

Now, what if the other thread is just replacing the value associated with a key, not changing the set of keys? Then there is a right answer, and it's available concurrently, as long as the dict implementation avoids mutating the hash table for this operation. At least in versions of CPython up to whatever was released by Jul 29 '10. Alex Martelli indirectly guarantees that in his answer to python dictionary is thread safe?. So, in that restricted case, you are safe in CPython—and likely in other implementations, but you'd want to read the code before relying on that.

As pointed out in the comments, the key you may end up comparing your lookup value with isn't guaranteed to be immutable, so even if the other thread doesn't do anything that changes the set of keys, it's still not absolutely guaranteed that you'll get the right answer. (You may have to craft a pathological key type to make this fail, but it would still be a legal key type.)

Community
  • 1
  • 1
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 1
    You said "an object that's guaranteed to be ... immutable.": Actually, it's not guaranteed to be immutable. for the most part, many python types which are hashable are also immutable; but there's no reasonable guarantee of that; For instance, I could define a class with a very inconsiderate `__hash__`, which changes the state of the object to randomly swap `__eq__` between `lambda _:True` and `lambda _:False`. python can neither prevent nor even detect this situation. the only guarantee is that even if this is going on, the misbehaving object is still *reachable* in the dict, with `keys()`. – SingleNegationElimination Dec 29 '12 at 02:21
  • @TokenMacGuy: It's actually worse than just illegal-but-unpreventable; it's actually _legal_ for the key to change value-by-eq, as long as it doesn't break the rule that value-equality implies hash-equality. (So it would be illegal to go from false to true unpredictably, but not the other way around.) – abarnert Dec 29 '12 at 02:31
  • 1
    hmmm... how can I use this loophole for evil? (commence cackling) – SingleNegationElimination Dec 29 '12 at 02:34
  • @TokenMacGuy: I'm not sure legal-but-pathological cases are better than illegal-but-undetectable ones. Breakin' in the law inherently makes it more evil, right? Or just more metal? I don't know what it's like… – abarnert Dec 29 '12 at 02:55