Sum dictionary values, using two keys as index: how to achieve this?

Question

I have the following dictionary:

res = [{'name': 'mfi', 'percentage': 100.0, 'tax_base': 1000.0, 'tax_amount': 140.0}, 
{'name': 'serv', 'percentage': 100.0, 'tax_base': 1000.0, 'tax_amount': 140.0}, 
{'name': 'inv', 'percentage': 100.0, 'tax_base': 1200.0, 'tax_amount': 168.0}, 
{'name': 'mfi', 'percentage': 50.0, 'tax_base': 1500.0, 'tax_amount': 210.0}, 
{'name': 'none', 'percentage': 0.0, 'tax_base': 1000.0, 'tax_amount': 0.0}, 
{'name': 'none', 'percentage': 0.0, 'tax_base': 900.0, 'tax_amount': 126.0}, 
{'name': 'mfi', 'percentage': 50.0, 'tax_base': 1000.0, 'tax_amount': 140.0}]

From this dictionary, I need to sum 'tax_base' and 'tax_amount' value keys, and use keys 'name' and 'percentage' as index.

As a result, I need:

res_final = [{'name': 'mfi', 'percentage': 100.0, 'tax_base': 1000.0, 'tax_amount': 140.0},
{'name': 'mfi', 'percentage': 50.0, 'tax_base': 2500.0, 'tax_amount': 350.0},  
{'name': 'serv', 'percentage': 100.0, 'tax_base': 1000.0, 'tax_amount': 140.0}, 
{'name': 'inv', 'percentage': 100.0, 'tax_base': 1200.0, 'tax_amount': 168.0}, 
{'name': 'none', 'percentage': 0.0, 'tax_base': 1900.0, 'tax_amount': 126.0}, 
]

How can I achieve this? Can you provide me a sample please?

Loop over the dictionaries, and create a new dictionary with the keys `name` and `percentage` (can be a tuple), and the values a dictionary with the `tax_base` and `tax_amount`, if they entry exists, add the values. Then as last you can iterate again over the dictionary to return it to the form you want. The above is an example, I suppose there are many more (and probably some better) approaches. — Thymen, Feb 28 '21 at 17:50
As I look at this a bit longer, you can also make it a [pd.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) and use the [groupby](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) function with aggregation. See the [docs](https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.core.groupby.DataFrameGroupBy.agg.html) for an example. — Thymen, Feb 28 '21 at 17:55
@Thymen, can you please send me a sample of the first approach please? — pmatos, Feb 28 '21 at 18:07

Thymen · Accepted Answer · 2021-02-28T18:50:47.780

Taking your original input, res and your preferred output res_final, the following code would work:

# Creating a temporary dictionary with the aggregation
temp_result = {}
for each in res:
    key = (each['name'], each['percentage'])
    if key not in temp_result:
        temp_result[key] = dict(tax_base=0, tax_amount=0)

    temp_result[key]['tax_base'] += each['tax_base']
    temp_result[key]['tax_amount'] += each['tax_amount']

# Transforming the temporary dictionary to the correct format
final_result = []
for (name, percentage), taxes in temp_result.items():
    final_result.append(dict(
            name=name,
            percentage=percentage,
            tax_base=taxes['tax_base'],
            tax_amount=taxes['tax_amount']
    ))

for each in final_result:
    print(each)

The result will be:

{'name': 'mfi', 'percentage': 100.0, 'tax_base': 1000.0, 'tax_amount': 140.0}
{'name': 'serv', 'percentage': 100.0, 'tax_base': 1000.0, 'tax_amount': 140.0}
{'name': 'inv', 'percentage': 100.0, 'tax_base': 1200.0, 'tax_amount': 168.0}
{'name': 'mfi', 'percentage': 50.0, 'tax_base': 2500.0, 'tax_amount': 350.0}
{'name': 'none', 'percentage': 0.0, 'tax_base': 1900.0, 'tax_amount': 126.0}

Explanation

In the first part we create a new dictionary, that has as key the combination of name and percentage as a tuple, and as value a dictionary with the tax_base and tax_amount for that key.

Then we check if the key is already in our dictionary and if it isn't we create the key. The final step is summing the tax_base and tax_amount.

Now we have one dictionary with all the information, but not in the right format. The second part takes care of that. We split the key again into the name and percentage and merge the data with tax_base and tax_amount to one dict.

Edit

In case people are wondering how to do it with pd.DataFrame.

import pandas as pd

df = pd.DataFrame(res)
res = df.groupby(['name', 'percentage'], as_index=False).sum()
final_result = res.to_dict('records')

for each in final_result:
    print(each)

Will result in the same output, but it is not guaranteed to be in the same order as the input.

Dear @Thymen, this is working as expected. Since the order is not important it is exactly what I need. Thank you — pmatos, Feb 28 '21 at 19:31

score 2 · Answer 2 · answered Feb 28 '21 at 18:47

Here's another (very similar) way of doing it but a little bit more concise.

Here we first create a list of unique name:percentage pairs and then loop over those unique keys, filtering out entries from the res list that do not match that unique key.

unique_keys = list(set([f"{d['name']}:{d['percentage']}" for d in res])) # use list(set()) to keep only unique values as keys
output = []

for key in unique_keys:
    name, percentage = key.split(":")
    matching_entries = list(filter(lambda d: d['name'] == name and str(d['percentage']) == percentage, res))
    
    summed = {"name": name, "percentage": float(percentage), "tax_base": 0.0, "tax_amount": 0.0}
    for entry in matching_entries:
        summed["tax_base"] += entry.get("tax_base", 0) # use get in case value is not in dictionary, 0 is default value
        summed["tax_amount"] += entry.get("tax_amount", 0)
    
    output.append(summed)

output.sort(key=lambda d: d['name']) # sorting to organize a bit

Output:

[{'name': 'inv', 'percentage': 100.0, 'tax_base': 1200.0, 'tax_amount': 168.0},
 {'name': 'mfi', 'percentage': 100.0, 'tax_base': 1000.0, 'tax_amount': 140.0},
 {'name': 'mfi', 'percentage': 50.0, 'tax_base': 2500.0, 'tax_amount': 350.0},
 {'name': 'none', 'percentage': 0.0, 'tax_base': 1900.0, 'tax_amount': 126.0},
 {'name': 'serv', 'percentage': 100.0, 'tax_base': 1000.0, 'tax_amount': 140.0}]

Hi @djvaroli, perhaps I missed something here. It's returning None when trying to print output.sort. The answer from Thymen solved my issue but anyway, I wanted to test your approach in order to understand it also — pmatos, Feb 28 '21 at 19:30
@pmatos, the reason you get `None` is because `output.sort()` is an `in-place` modification, it changes the list, but doesn't return a new list. You can print his output by printing `output` (the variable). Also see [this](https://stackoverflow.com/questions/22442378/what-is-the-difference-between-sortedlist-vs-list-sort) post for a more elaborate answer. — Thymen, Mar 01 '21 at 00:10

Sum dictionary values, using two keys as index: how to achieve this?

2 Answers2

Explanation

Edit

Linked