How can I combine values based on some key in python dict just like SQL GROUP BY

Question

L = [{'id':1, 'quantity':1}, {'id':2, 'quantity':2}, {'id':1, 'quantity':3}]

I want to add quantity base on id ,

So for the list above I would like the output to be:

 [{'id':1,'quantity':4},{'id':2,'quantity':2}]

another example:

L = [{'id':1, 'quantity':1}, {'id':2, 'quantity':2}, {'id':1, 'quantity':2}, {'id':1, 'quantity':3}]

So for the list above I would like the output to be:

 [{'id':1, 'quantity':6}, {'id':2, 'quantity':2}]

because L[0] and L[2] have equal key id =1,so sum of their value of quantity — zhang olve, Nov 02 '17 at 08:45
but the `id` should've been `1` and not `4`, however looks like someone has edited it for you. — Ashish Ranjan, Nov 02 '17 at 08:46
Duplicate of https://stackoverflow.com/questions/28401904/python-collections-counter-for-a-list-of-dictionaries — Benjamin, Nov 02 '17 at 08:49
you are right ,that is my spelling errors.sorry for confusing you . — zhang olve, Nov 02 '17 at 09:00

RomanPerekhrest · Answer 1 · 2017-11-02T08:51:58.853

2

In python "group by" functionality may be achieved by itertools.groupby() function:

import itertools

l = [{'id':1, 'quantity':1}, {'id':2, 'quantity':2}, {'id':1, 'quantity':3}]
result = [ {'id': k, 'quantity': sum(_['quantity'] for _ in g)} 
            for k,g in itertools.groupby(sorted(l, key=lambda x:x['id']), key=lambda x:x['id'])]

print(result)

The output:

[{'id': 1, 'quantity': 4}, {'id': 2, 'quantity': 2}]

edited Nov 02 '17 at 08:51

answered Nov 02 '17 at 08:49

RomanPerekhrest

88,541
4
65
105

but you are traversing the list once for every unique `id` like that, right? That's quite inefficient. – Ma0 Nov 02 '17 at 08:51

score 2 · Answer 2 · answered Nov 02 '17 at 08:54

This should do what you want:

from collections import defaultdict

def combine(items):
    counts = defaultdict(int)
    for d in items:
        counts[d["id"]] += d["quantity"]

    return [{"id": id, "quantity": q} for id, q in counts.items()]

Examples:

>>> combine([{'id':1, 'quantity':1}, {'id':2, 'quantity':2}, {'id':1, 'quantity':3}])
[{'quantity': 4, 'id': 1}, {'quantity': 2, 'id': 2}]

>>> combine([{'id':1, 'quantity':1}, {'id':2, 'quantity':2}, {'id':1, 'quantity':2}, {'id':1, 'quantity':3}])
[{'quantity': 6, 'id': 1}, {'quantity': 2, 'id': 2}]

This is about as simple and efficient as you're going to get.

score 1 · Answer 3 · answered Nov 02 '17 at 08:48

1

convert it to dataframe and then back to dict

import pandas as pd
L = [{'id':1, 'quantity':1}, {'id':2, 'quantity':2}, {'id':1, 'quantity':3}]
output=pd.DataFrame(L).groupby('id')['quantity'].sum().to_dict()

answered Nov 02 '17 at 08:48

Binyamin Even

3,318
1
18
45

score 0 · Answer 4 · answered Nov 02 '17 at 08:55

Assuming the input is properly defined, here I implemented in a intuitive way to achieve this:

output = {}
keys=[]
for e in L:
    if e['id'] not in keys:
        keys.append(e['id'])
        output[e['id']] = e['quantity']
    else:
        output[e['id']] += e['quantity']

[{'id':key,'identity':values} for key,values in  output.items()]

I was actually wondering that is there any further requirements, for instance, that you need a probably higher efficiency to perform a huge volume of data? If yes, this method seems to be tedious.

How can I combine values based on some key in python dict just like SQL GROUP BY

4 Answers4