4

I have a function call that starts 10 threads. Before the start of these threads , I have

from collections import defaultdict
output = defaultdict(dict)

and output is empty.

Each thread will generate data to write to the dictionary.

Something like:

output['water'] = 'h20'
output['fire'] = 'delta of oxygen'
....

The threads will only add items and they do not iterate over any of the other items or modify any other items. output['water'] being an item that is different from output['fire']. I can also guarantee that no two threads are going to create the same item. That is, each thread T has a unique i. In code: output[i] is unique per thread.

Is this dictionary thread safe in this regard?

Cripto
  • 3,581
  • 7
  • 41
  • 65
  • You know that python is actually quite bad at threading? Consider whether you even want to do this. – Marcin Jul 25 '13 at 18:44
  • 1
    If you send the entries through a queue, they are automatically thread safe. – Jiminion Jul 25 '13 at 18:47
  • Yes it is safe because the GIL will prevent more than one thread to execute python code concurrently. – Bakuriu Jul 25 '13 at 18:49
  • @Marcin It's not any better or worse than any other procedural language. In fact, I would argue that with Python 3, it's actually better than many. It won't beat a functional language in threading, but I don't think there is a procedural language that could. – supercheetah Jul 25 '13 at 18:57
  • 2
    Why would you use a `defaultdict(dict)` if the values are just going to be strings? – user2357112 Jul 25 '13 at 19:00
  • 3
    @Bakuriu: The GIL won't make accessing arbitrary data structures thread-safe. – Sven Marnach Jul 25 '13 at 19:01
  • Why do you want to use threads? What are you actually trying to achieve? – Sven Marnach Jul 25 '13 at 19:02
  • 1
    @supercheetah Most other languages don't have a GIL. It's actually moderately tricky to get python threading to do anything remotely useful. – Marcin Jul 25 '13 at 19:07
  • [Here](http://stackoverflow.com/questions/3358770/python-dictionary-is-thread-safe) is a question and various answers that may help. – mshildt Jul 25 '13 at 19:12
  • @SvenMarnach Since the assignment is performed in a single bytecode, and AFAIK bytecodes are atomic, it should[maybe this has changed when they reworked the GIL sometime ago? Sure it *was* true some years ago]. – Bakuriu Jul 25 '13 at 19:17
  • 3
    @Marcin: I strongly disagree. The most common use case of threading is to avoid blocking. Most GUI applications in Python use threads for this purpose. Only a tiny fraction of threaded applications is actually CPU-bound, so for them, threads in Python are not any different from any other language. It's also easy to write threaded (CPU-bound) number-crunching code with NumPy (which releases the GIL for the heavy lifting), and many other applicatons. – Sven Marnach Jul 25 '13 at 19:17
  • @Baikuriu: The GIL hasn't changed in a while. Adding new items to a defaultdict isn't a single bytecode, and it's not atomic. (Even for a normal dict, an instruction like `d[key] = value` will result in three load instructions and one `STORE_SUBSCR`.) – Sven Marnach Jul 25 '13 at 19:22
  • @SvenMarnach Given that python threading is purely co-operative, and there can only be one active thread, python there is more opportunity to write poor (i.e. pointlessly so) threaded code in python than in languages where either of those is not true. Of course, threads in general create traps for the unwary, so while this isn't unique to python, I would say that it particularly behooves the python user to consider whether some other concurrency mechanism is appropriate. – Marcin Jul 25 '13 at 19:41
  • While this is not very scientific of me, I think the people saying that it is thread safe are correct. I just ran it with 90 threads at once and no errors and I just check all 90 results. They look fine. Again. Not very scientific of me. – Cripto Jul 25 '13 at 19:43
  • Your 90 tests have just shown your that there are scenarios in which the writes look like they were thread-safe. You can't really claim an operation to be thread-safe just by running it x number of times. The code would turn out to be not thread-safe when your computer is under stress, or with more cpu threads or because of many other factors. – Maciej Gol Jul 25 '13 at 20:32
  • @Marcin: I was mainly disagreeing with the claim that "it's actually moderately tricky to get Python threading to do anything remotely useful." It's true that you are often better off with `multiprocessing` or whatever, but that wasn't the point. Python's threading isn't "purely co-operative" -- in fact, it's actually mostly preemptive. And there can be more than one active thread as well, but only one of them can execute Python byte code. When doing number crunching, this is usually completely irrelevant, since the actual work isn't done in pure Python. – Sven Marnach Jul 25 '13 at 22:46
  • @kroolik, and anyone else for that matter, can someone please explain why, they think it is or is not thread safe... This thread is not wether pythons is good for threading or not. – Cripto Jul 26 '13 at 01:11

1 Answers1

3

Yes.

If you are using CPython and strings as keys, then yes. The GIL in CPython ensures only one thread executes bytecode at a time, and setting a key to a value in a dict happens in a single opcode, STORE_SUBSCR. If you are not using CPython, or you are using a key that has custom __hash__, __eq__, or __cmp__ methods, all bets are off. If I had a soapbox, I'd hop on it and warn you of the evils of relying on implementation details like this for correctness. It's more pythonic of you to write something that works only for the case and in the environment where it will be used, since doing otherwise could be seen as a premature optimization. Enjoy your working code!

>>> from dis import dis
>>> dis(compile('output = defaultdict(dict); output["water"] = "H2O"', 'example', 'exec'))
  1           0 LOAD_NAME                0 (defaultdict)
              3 LOAD_NAME                1 (dict)
              6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
              9 STORE_NAME               2 (output)
             12 LOAD_CONST               0 ('H2O')
             15 LOAD_NAME                2 (output)
             18 LOAD_CONST               1 ('water')
             21 STORE_SUBSCR
             22 LOAD_CONST               2 (None)
             25 RETURN_VALUE

This has been discussed elsewhere.

Peter G
  • 1,613
  • 10
  • 10