Approach
Using the idea from this answer Convert a list of dictionaries into a set of dictionaries
- Dictionaries in two lists are serialized to string (using json)
- Serialized dictionaries are placed into two sets corresponding to each list
- Difference is computed between the two sets
- Find new elements using set difference
- Deserialize elements in the set different to get list of new elements
Code
from json import dumps, loads
def find_difference(lst1, lst2):
# Returns elements in lst1 which are not in lst2
set1 = dics_to_set(lst1)
set2 = dics_to_set(lst2)
# Deserialize elements in set1 that are not in set2
return [loads(x) for x in set1.difference(set2)] # items in set1 that are not in set2
def dics_to_set(lst):
'''
Convert list of dicts to set
'''
return set(dumps(x, sort_keys=True) for x in lst) # sort_keys to control order of keys
Performance
Summary
- With 100K values 2600X faster (2.06 s vs. 9min 3s)
Test setup:
- List 2: 100K random dictionaries each with 5 keys
- List 1: copy of List 2 plus one additional random dictionary
Test Code
def rand_dicts(n):
'''
Create random dictionary of n elements
'''
mydict = {}
for i in range(n):
mydict[f'key{i}'] = randrange(100)
return mydict
# List of random dictionaries with 5 elements each
lst2 = [rand_dicts(5) for _ in range(100000)]
# Copy of list 2 with one more random dictionary added
lst1 = lst2 + [rand_dicts(1)]
Timing using timeit module
# Test of Posted Code
%timeit [x for x in lst1 if x not in lst2]
# Output: 9min 3s ± 13 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
# Test of proposed code
%timeit find_difference(lst1, lst2)
Output: 2.06 s ± 90.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)