I am currently working on a modeling environment in python which uses dicts to share connection properties of connected parts. My current way of doing this takes about 15-20% of my total program run time which is quite alot having a few million iterations...
So I'm find myself looking at how to speed up updating multiple values in dicts and getting multiple values from dicts.
My example dict looks like this (number of key-value-pairs is expected to stay in the current range of 300 to 1000, thus I filled it to this amount):
val_dict = {'a': 5.0, 'b': 18.8, 'c': -55/2}
for i in range(200):
val_dict[str(i)] = i
val_dict[i] = i**2
keys = ('b', 123, '89', 'c')
new_values = np.arange(10, 41, 10)
length = new_values.shape[0]
While keys
and the shape of new_values
and the number of key-value-pairs in val_dict
will always be constant, the values of new_values
change at each iteration and thus have to be updated at each iteration (and also be retrieved at each iteration from within another part of my code).
I timed several methods, where getting multiple values from dicts seems to be the fastest by using itemgetter
from the operator
module. I can define getter
before the iteration starts, because the needed variables are constant:
getter = itemgetter(*keys)
%timeit getter(val_dict)
The slowest run took 10.45 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 140 ns per loop
I guess this is quite ok, or is there anything faster?
But when assigning these values to a numpy array by masking, it slows down horribly:
result = np.ones(25)
idx = np.array((0, 5, 8, -1))
def getter_fun(result, idx, getter, val_dict):
result[idx] = getter(val_dict)
%timeit getter_fun(result, idx, getter, new_values)
The slowest run took 11.44 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.77 µs per loop
Is there any way I can improve that? I guess the tuple unpacking is the worst part here...
For setting multiple values I've timed a few ways to do it: A function which unpacks the values, a function which uses update with key-value-pairs given, a function using a for-loop, a dict comprehension and a generator function.
def unpack_putter(val_dict, keys, new_values):
(val_dict[keys[0]],
val_dict[keys[1]],
val_dict[keys[2]],
val_dict[keys[3]]) = new_values
%timeit unpack_putter(val_dict, keys, new_values)
The slowest run took 8.85 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.29 µs per loop
def upd_putter(val_dict, keys, new_values):
val_dict.update({keys[0]: new_values[0],
keys[1]: new_values[1],
keys[2]: new_values[2],
keys[3]: new_values[3]})
%timeit upd_putter(val_dict, keys, new_values)
The slowest run took 15.22 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 963 ns per loop
def for_putter(val_dict, keys, new_values, length):
for i in range(length):
val_dict[keys[i]] = new_values[i]
%timeit for_putter(val_dict, keys, new_values, length)
The slowest run took 12.31 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.14 µs per loop
def dictcomp_putter(val_dict, keys, new_values, length):
val_dict.update({keys[i]: new_values[i] for i in range(length)})
%timeit dictcomp_putter(val_dict, keys, new_values, length)
The slowest run took 7.13 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.69 µs per loop
def gen_putter(val_dict, keys, new_values, length):
gen = ((keys[i], new_values[i]) for i in range(length))
val_dict.update(dict(gen))
%timeit gen_putter(val_dict, keys, new_values, length)
The slowest run took 10.03 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.54 µs per loop
The upd_putter
would be the fastest, but can I somehow use it with alternating shape of keys
and new_values
(they will still be constant during iterations, but each part which is considered has a different amount of keys to update which has to be determined by user input). Interestingly the for-loop seems quite ok to me. So I guess I'm doing it wrong and there must be a faster way to do it.
One last thing to consider: I'll most probably use Cython soon, so I guess this will make the for loop favorable? Or I could use joblib
to parallelize the for loop. I also thought about using numba
, but then I'd have to get rid of all dicts...
Hopefully you can help me with this problem.
edit for MSeifert (even though I'm not sure if you meant it like that):
tuplelist = list()
for i in range(200):
tuplelist.append(i)
tuplelist.append(str(i))
keys_long = tuple(tuplelist)
new_values_long = np.arange(0,400)
%timeit for_putter(val_dict, keys_long, new_values_long, 400)
10000 loops, best of 3: 73.5 µs per loop
%timeit dictcomp_putter(val_dict, keys_long, new_values_long, 400)
10000 loops, best of 3: 96.4 µs per loop
%timeit gen_putter(val_dict, keys_long, new_values_long, 400)
10000 loops, best of 3: 129 µs per loop