Given:
main = [
{'id': 1, 'rate': 13, 'type': 'C'},
{'id': 2, 'rate': 39, 'type': 'A'},
{'id': 3, 'rate': 94, 'type': 'A'},
{'id': 4, 'rate': 95, 'type': 'A'},
{'id': 5, 'rate': 96, 'type': 'A'}
]
compare = [
{'id': 119, 'rate': 33, 'type': 'D'},
{'id': 120, 'rate': 94, 'type': 'A'}
]
You can first map the two lists of dicts into two dicts of lists of dicts indexed by type
, and sort sub-lists by rate
:
mappings = []
for lst in main, compare:
mappings.append({})
for entry in lst:
mappings[-1].setdefault(entry['type'], []).append(entry)
for entries in mappings[-1].values():
entries.sort(key=lambda entry: entry['rate'])
main, compare = mappings
so that main
becomes:
{'C': [{'id': 1, 'rate': 13, 'type': 'C'}],
'A': [{'id': 2, 'rate': 39, 'type': 'A'},
{'id': 3, 'rate': 94, 'type': 'A'},
{'id': 4, 'rate': 95, 'type': 'A'},
{'id': 5, 'rate': 96, 'type': 'A'}]}
while compare
becomes:
{'D': [{'id': 119, 'rate': 33, 'type': 'D'}],
'A': [{'id': 120, 'rate': 94, 'type': 'A'}]}
so that you iterate through the matching types of the two dicts in linear time, and use bisect
to find the index in each sub-list of main
where the rate
is greater than that of compare
, which takes a time complexity of O(log n), and then iterate through the rest of the sub-list from that index for processing. Overall this algorithm is of O(n log n) in time complexity, an improvement over the O(n ^ 2) time complexity of your original code:
from bisect import bisect
for type in main.keys() & compare.keys():
for entry in compare[type]:
main_entries = main[type]
for match in main_entries[bisect([d['rate'] for d in main_entries], entry['rate']):]:
print(match['id'], entry['id'])
This outputs:
4 120
5 120
Demo: https://repl.it/repls/EasygoingReadyTechnologies
Disclaimer: This may look like an implementation of @ThomasWeller's solution but I actually did not see his answer until I finished my coding, which was interrupted by my other work. Also @ThomasWeller wants to sort the two lists by type
, which would incur an O(n log n) time complexity, when it can be done in linear time as shown in the for entry in lst
loop in my code.