I have a sorted list of dictionaries like so:
dat = [
{"id1": 1, "id2": 2, "value": 1},
{"id1": 1, "id2": 2, "value": 2},
{"id1": 2, "id2": 2, "value": 2},
{"id1": 2, "id2": 3, "value": 1},
{"id1": 3, "id2": 3, "value": 1},
{"id1": 3, "id2": 4, "value": 1},
{"id1": 3, "id2": 4, "value": 1},
{"id1": 3, "id2": 4, "value": 1},
{"id1": 3, "id2": 4, "value": 1},
]
This is effectively (id1, id2, value) tuples, but where there are duplicates. I would like to deduplicate these by summing the values where both ids are equal, leaving me with unique (id1, id2) pairs where the new value is the sum of the dupes.
That is, from above, the desired output is:
dat =[
{'id1': 1, 'id2': 2, 'value': 3},
{'id1': 2, 'id2': 2, 'value': 2},
{'id1': 2, 'id2': 3, 'value': 1},
{'id1': 3, 'id2': 3, 'value': 1},
{'id1': 3, 'id2': 4, 'value': 4}
]
Assume the list is millions with lots of duplicates. What's the most efficient way to do this using itertools
or funcy
(versus say, using pandas)?