1

I have two dictionaries, A and B. A is a dictionary of dictionaries. The keys in the second level dictionaries match the keys in B.

For example, A could be:

A[key1][key_1] = 1
A[key1][key_2] = 4 
A[key1][key_3] = 2
A[key2][key_2] = 5
A[key3][key_1] = 1
A[key3][key_3] = 2

and B could be:

B[key_1] = 7
B[key_2] = 8 
B[key_3] = 9

I've written a loop to multiply the values in each key of A by B

for Akey in A.keys():
    sum_Akey[Akey] = sum(map(lambda x: A[Akey][x]*B[x], B))

where sum_Akey is a dictionary for storing the sums. It is keyed by the same values as the top level keys in A.

For example: sum_Akey[key1] = 1*7 + 4*8 + 2*9 = 57

With a large enough A and B, this takes a really long time.

Out of curiosity, I removed the sum() to see what would happen. Removing the sum() makes it run much faster. I tried other approaches, e.g., making a list out of the map and then summing.

It appears that doing anything on the map object is the bottleneck.

Is there another, quicker way to get the sum of the values in the map iterator?

Is there a faster way of getting the final sum?

NOTE: I found the Q&A just now. It answers one of my questions. python map, list(map), lambda and performance

user1757436
  • 151
  • 4
  • This might be done faster via pandas -- convert to dataframes, perform a sql-style merge based on B-keys, multiply columns, and then `df.groupby(A-key).sum()` – kevinsa5 Mar 06 '18 at 15:00
  • 1
    A decent improvement might be achieved by doing `temp = A[Akey]` and then `sum(map(lambda x, y: temp[x]*y, B.items()))`. I am assuming Python 3 – Ma0 Mar 06 '18 at 15:02
  • Yes. Python 3.5 – user1757436 Mar 06 '18 at 15:54

1 Answers1

0

lambda with map is inefficient. You can see some performance improvement using a comprehension:

from collections import defaultdict
import random

A = defaultdict(lambda: defaultdict(int))
B = {}

n = 1000
for i in range(n):
    for j in range(n):
        A[i][j] = random.randint(0, 9)

B = {i: random.randint(0, 9) for i in range(n)}

def original():
    for Akey in A.keys():
        sum_Akey[Akey] = sum(map(lambda x: A[Akey][x]*B[x], B))
    return sum_Akey

def new():
    return {Akey: sum(A[Akey][x] * B[x] for x in B) for Akey in A}

%timeit original()  # 1 loop, best of 3: 345 ms per loop
%timeit new()       # 1 loop, best of 3: 289 ms per loop
jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thanks! Follow up. I bumped n up to 10,000. The timings were 28.6 sec and 18.9 sec respectively. Is that fast? Can it get faster? (I'm new to this.) – user1757436 Mar 06 '18 at 19:24
  • There may be other solutions (see comments to this question). Might be worth waiting a bit longer to see if someone else responds. – jpp Mar 06 '18 at 20:24