Why does getting values out of a map iterator take so long?

Question

I have two dictionaries, A and B. A is a dictionary of dictionaries. The keys in the second level dictionaries match the keys in B.

For example, A could be:

A[key1][key_1] = 1
A[key1][key_2] = 4 
A[key1][key_3] = 2
A[key2][key_2] = 5
A[key3][key_1] = 1
A[key3][key_3] = 2

and B could be:

B[key_1] = 7
B[key_2] = 8 
B[key_3] = 9

I've written a loop to multiply the values in each key of A by B

for Akey in A.keys():
    sum_Akey[Akey] = sum(map(lambda x: A[Akey][x]*B[x], B))

where sum_Akey is a dictionary for storing the sums. It is keyed by the same values as the top level keys in A.

For example: sum_Akey[key1] = 1*7 + 4*8 + 2*9 = 57

With a large enough A and B, this takes a really long time.

Out of curiosity, I removed the sum() to see what would happen. Removing the sum() makes it run much faster. I tried other approaches, e.g., making a list out of the map and then summing.

It appears that doing anything on the map object is the bottleneck.

Is there another, quicker way to get the sum of the values in the map iterator?

Is there a faster way of getting the final sum?

NOTE: I found the Q&A just now. It answers one of my questions. python map, list(map), lambda and performance

This might be done faster via pandas -- convert to dataframes, perform a sql-style merge based on B-keys, multiply columns, and then `df.groupby(A-key).sum()` — kevinsa5, Mar 06 '18 at 15:00
A decent improvement might be achieved by doing `temp = A[Akey]` and then `sum(map(lambda x, y: temp[x]*y, B.items()))`. I am assuming Python 3 — Ma0, Mar 06 '18 at 15:02

jpp · Accepted Answer · 2018-03-06T15:19:09.087

0

lambda with map is inefficient. You can see some performance improvement using a comprehension:

from collections import defaultdict
import random

A = defaultdict(lambda: defaultdict(int))
B = {}

n = 1000
for i in range(n):
    for j in range(n):
        A[i][j] = random.randint(0, 9)

B = {i: random.randint(0, 9) for i in range(n)}

def original():
    for Akey in A.keys():
        sum_Akey[Akey] = sum(map(lambda x: A[Akey][x]*B[x], B))
    return sum_Akey

def new():
    return {Akey: sum(A[Akey][x] * B[x] for x in B) for Akey in A}

%timeit original()  # 1 loop, best of 3: 345 ms per loop
%timeit new()       # 1 loop, best of 3: 289 ms per loop

edited Mar 06 '18 at 15:19

answered Mar 06 '18 at 15:10

jpp

159,742
34
281
339

Thanks! Follow up. I bumped n up to 10,000. The timings were 28.6 sec and 18.9 sec respectively. Is that fast? Can it get faster? (I'm new to this.) – user1757436 Mar 06 '18 at 19:24
There may be other solutions (see comments to this question). Might be worth waiting a bit longer to see if someone else responds. – jpp Mar 06 '18 at 20:24

Why does getting values out of a map iterator take so long?

1 Answers1