None of the solutions so far solve the general case of intersecting N dictionaries.
So, if you want to handle the intersection of N
arbitrary dictionaries:
from functools import reduce
def dict_intersection(*dict_list):
return reduce(lambda a,b: dict(a.items() & b.items()), dict_list)
a = {k:k for k in range(0,5)} # {0: 0, 1: 1, 2: 2, 3: 3, 4: 4}
b = {k:k for k in range(2,7)} # {2: 2, 3: 3, 4: 4, 5: 5, 6: 6}
c = {k:k for k in range(3,8)} # {3: 3, 4: 4, 5: 5, 6: 6, 7: 7}
dict_intersection(a,b,c) # {3:3, 4:4}
# or if you have a list of dicts
dicts = [{k:k for k in range(0+n,5+n)} for n in (0,2,3)] # == [a,b,c]
dict_intersection(*dicts) # {3:3, 4:4}
Using functools.reduce
allows for completing the operation within a single iteration over the list of dictionaries instead of the multiple loops in some solutions. It also doesn't perform any additional conditional statements.
Trade-offs
Changing dict_intersection_v1
to dict_intersection_v2
we can see it performs faster for a larger lists of dictionaries and/or dictionaries (designing a proper experiment to test out which is a larger factor is outside the scope of this solution). This performance gain is due to reducing the amount of dictionary instantiations.
def dict_intersection_v1(*dict_list):
return reduce(lambda a,b: dict(a.items() & b.items()), dict_list)
def dict_intersection_v2(*dict_list):
return dict(reduce(lambda a,b: a & b, (d.items() for d in dict_list)))
dict_lst1 = [{k:k for k in range(0+n,5+n)} for n in (0,2,3)] # = [a,b,c]
dict_lst2 = [{k:k for k in range(0,50,n)} for n in range(1,5)]]
dict_lst3 = [{k:k for k in range(0,500,n)} for n in range(40)]
dict_lst4 = [{k:k for k in range(0+n,500+n)} for n in range(400)]
dict list |
kv pair count |
dict_intersection_v1 |
dict_intersection_v2 |
relative difference |
1 |
15 |
808 ns ± 4.31 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) |
821 ns ± 0.785 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) |
+ 1.6% |
2 |
105 |
3.14 µs ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) |
2.38 µs ± 5.76 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) |
-24.2% |
3 |
2155 |
36.9 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) |
25.1 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) |
-32.0% |
4 |
200_000 |
9.08 ms ± 22 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) |
4.88 ms ± 5.31 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) |
-46.3% |
The regression for result dict_lst1
is mainly due to difference in overhead between creating a dictionary after every intersection and the overhead due to dict.items()
calls within the generator (and python's general function call overhead).
NB: I did test using the pre-calculated list of dict.items()
for the for a dictionary instead of v2's building the generator on the fly.
I tested both passing in the pre-calculated list outside of the timings and within the timings, and, while it's statistically significant, it's less than 30 μs and 10 μs respectively. If you are trying to get these gains looking at a different language or Cython might be better.