1

I have a simple function for extracting keys and values from a dictionary.

def separate_kv_fast(adict):
    '''Separates keys/values from a dictionary to corresponding arrays'''
    return adict.keys(), adict.values() 

I know the order is guaranteed if the dictionary "adict" is not modified between the .keys() and .values() call. What I am wondering is if the return statement guarantees this; basically, is it going to be thread safe?

Is the following construction of "adict" any safer for multi-threading or not-needed?

def separate_kv_fast(adict):
    '''Separates keys/values from a dictionary to corresponding arrays'''
    bdict = dict(adict)
    return bdict.keys(), bdict.values() 
Rory Daulton
  • 21,934
  • 6
  • 42
  • 50
clocker
  • 1,376
  • 9
  • 17

1 Answers1

1

I've been working on learning python disassembly, and I believe this shows the two calls are not atomic:

>>> dis.dis(separate_kv_fast)                                                                             
  2           0 LOAD_FAST                0 (adict)
              3 LOAD_ATTR                0 (keys)
              6 CALL_FUNCTION            0
              9 LOAD_FAST                0 (adict)
             12 LOAD_ATTR                1 (values)
             15 CALL_FUNCTION            0
             18 BUILD_TUPLE              2
             21 RETURN_VALUE        
>>> 

That it calls keys and values across multiple opcodes I believe demonstrates it is not atomic.

Let's see how your bdict = dict(adict) works out:

  2           0 LOAD_GLOBAL              0 (dict)
              3 LOAD_FAST                0 (adict)
              6 CALL_FUNCTION            1
              9 STORE_FAST               1 (bdict)

LOAD_FAST pushes a reference to adict onto the stack. We then call dict with that argument. What we don't know is if dict() function is atomic.

bdict = adict.copy() gives a similar disassembly. adict.copy can't be disassembled.

Everything I read says that internal types are thread safe. So I believe a single function call into a dictionary would be internally consistent. i.e., items(), copy(), values(), keys(), etc. Two calls in serial (values() followed by keys() aren't necessarilly safe. Neither are iterators.

Is there a reason your not just using items()?

I was curious, so went ahead and benchmarked:

#!/usr/bin/python
import timeit
import random

D = dict()
for x in xrange(0, 1000):
    D[x] = str(x)

def a():
    return D.keys(), D.values()

def b():
    keys = []
    values = []
    for k, v in D.items():
        keys.append(k)
        values.append(v)
    return keys, values

def c():
    d = D.copy()
    return d.keys(), d.values()

def d():
    return zip(*D.items())

print timeit.timeit("a()", 'from __main__ import a')
print timeit.timeit("b()", 'from __main__ import b')
print timeit.timeit("c()", 'from __main__ import c')
print timeit.timeit("d()", 'from __main__ import d')

Results:

6.56165385246
145.151810169
19.9027020931
65.4051799774

The copy is the fasted atomic one (and might be slightly faster than using dict()).

rrauenza
  • 6,285
  • 4
  • 32
  • 57
  • The disassembled version definitely adds insight. My first version generate_kv() uses .items()/.iteritems() which is certainly safer, but then requires iteration over them to extract the keys and values with some speed penalty. It's not much but I got curious especially since I am considering using multiprocessing or multi-threading to generate the dictionary. Thanks! – clocker Jun 19 '16 at 04:20
  • `iteritems()` I would expect to have issues since it is reentrant. `items()` just returns a list of key value pairs. I would expect it to be as safe (and slow) as `copy()` – rrauenza Jun 19 '16 at 04:24
  • 1
    You realize if you use `multiprocessing`, the objects are not shared over the processes? So concurrency of your dictionary is not a problem. You might be interested in [this question](http://stackoverflow.com/questions/37890199/reduce-execution-time-on-huge-list-generation) that I answered -- it was a large list being generated over `multiprocessing`. – rrauenza Jun 19 '16 at 05:40