Making same list with dicts but only with unique records

Question

list = [
    {'status': u'Purchase', 'phantom': False, 'row_no': 1, 'product_id': 25872, 'standard_price': 14.0, 'qty': 1.0, 'cost': 14.0},
    {'status': u'Purchase', 'phantom': False, 'row_no': 2, 'product_id': 25872, 'standard_price': 14.0, 'qty': 1.0, 'cost': 14.0},
    {'status': u'Purchase', 'phantom': False, 'row_no': 3, 'product_id': 25875, 'standard_price': 14.0, 'qty': 1.0, 'cost': 14.0},
    {'status': u'Purchase', 'phantom': False, 'row_no': 4, 'product_id': 25875, 'standard_price': 14.0, 'qty': 1.0, 'cost': 14.0},
    {'status': u'Purchase', 'phantom': False, 'row_no': 5, 'product_id': 25875, 'standard_price': 14.0, 'qty': 1.0, 'cost': 14.0}  
]

I have this list with dicts in it, as you can see there are two lines with product_id 25872, and tree lines with product_id 25875.

How can I go over all dicts in my list and make the same list with dicts but only 1 line per product? and 'qty' should sum up.

So from this list, I want to get output like

list = [
    {'status': u'Purchase', 'phantom': False, 'row_no': 1, 'product_id': 25872, 'standard_price': 14.0, 'qty': 2.0, 'cost': 14.0},
    {'status': u'Purchase', 'phantom': False, 'row_no': 2, 'product_id': 25875, 'standard_price': 14.0, 'qty': 3.0, 'cost': 14.0},
]

If not already sorted, `sort` the list using `key` function that returns the product id. Then use `itertools.groupby` with the same key function and take one item from each group with summing up `qty` for the whole group. — Michael Butscher, Jun 09 '19 at 20:11
Possible duplicate of [How to aggregate particular property value that is group by a particular property in a list](https://stackoverflow.com/questions/42970312/how-to-aggregate-particular-property-value-that-is-group-by-a-particular-propert) — VPfB, Jun 09 '19 at 20:12
You can look at this very similar question https://stackoverflow.com/questions/55317251/how-to-remove-duplicates-from-list-of-dicts — Heladio Amaya, Jun 09 '19 at 20:13

score 2 · Answer 1 · answered Jun 09 '19 at 21:20

I think itertools.groupby should be enough for this with some help from sum and python's list comprehensions. Try this:

from itertools import groupby

lst = [
        {'status': u'Purchase', 'phantom': False, 'row_no': 1, 'product_id': 25872, 'standard_price': 14.0, 'qty': 1.0, 'cost': 14.0},
        {'status': u'Purchase', 'phantom': False, 'row_no': 2, 'product_id': 25872, 'standard_price': 14.0, 'qty': 1.0, 'cost': 14.0},
        {'status': u'Purchase', 'phantom': False, 'row_no': 3, 'product_id': 25875, 'standard_price': 14.0, 'qty': 1.0, 'cost': 14.0},
        {'status': u'Purchase', 'phantom': False, 'row_no': 4, 'product_id': 25875, 'standard_price': 14.0, 'qty': 1.0, 'cost': 14.0},
        {'status': u'Purchase', 'phantom': False, 'row_no': 5, 'product_id': 25875, 'standard_price': 14.0, 'qty': 1.0, 'cost': 14.0}  
]

# Sort the list first, by `product_id`
lst = sorted(lst, key=lambda x:x['product_id'])

# This is where we will store our unique rows
agg = []
row_count = 1

for k,v in groupby(lst,key=lambda x:x['product_id']):
        as_list = list(v)
        as_list[0].update({
                'qty': sum([row['qty'] for row in as_list]),
                'row_no': row_count
        })
        agg.append(as_list[0])
        row_count += 1

# Print the final result
print(agg)

Note: Please do not use list as a variable name.

Alain T. · Accepted Answer · 2019-06-10T15:17:49.260

You can create a dictionary of dictionaries using the product_id as key to make entries unique. Then get .values() from that grouping dictionary. To add up the quantities, go through the merged entries and update the "qty" entry with the sum of corresponding values in the list. Same thing for the row numbers (if needed).

list1 = [
    {'status': u'Purchase', 'phantom': False, 'row_no': 1, 'product_id': 25872, 'standard_price': 14.0, 'qty': 1.0, 'cost': 14.0},
    {'status': u'Purchase', 'phantom': False, 'row_no': 2, 'product_id': 25872, 'standard_price': 14.0, 'qty': 1.0, 'cost': 14.0},
    {'status': u'Purchase', 'phantom': False, 'row_no': 3, 'product_id': 25875, 'standard_price': 14.0, 'qty': 1.0, 'cost': 14.0},
    {'status': u'Purchase', 'phantom': False, 'row_no': 4, 'product_id': 25875, 'standard_price': 14.0, 'qty': 1.0, 'cost': 14.0},
    {'status': u'Purchase', 'phantom': False, 'row_no': 5, 'product_id': 25875, 'standard_price': 14.0, 'qty': 1.0, 'cost': 14.0}  
]

pid   = "product_id"
merged = {d[pid]:d for d in list1}.values()
merged = [{**m,"qty":sum(ld["qty"] for ld in list1 if ld[pid]==m[pid])} for m in merged]
merged = [{**m,"row_no":i+1} for i,m in enumerate(merged)]

print(merged)

[{'status': 'Purchase', 'phantom': False, 'row_no': 1, 'product_id': 25872, 'standard_price': 14.0, 'qty': 2.0, 'cost': 14.0},
 {'status': 'Purchase', 'phantom': False, 'row_no': 2, 'product_id': 25875, 'standard_price': 14.0, 'qty': 3.0, 'cost': 14.0}]

Based on @OP's output it seems that you'll need to update `row_no` also - not just `qty`. — shriek, Jun 10 '19 at 13:43

score 1 · Answer 3 · answered Jun 09 '19 at 20:36

list = [
    {'status': u'Purchase', 'phantom': False, 'row_no': 1, 'product_id': 25872, 'standard_price': 14.0, 'qty': 1.0,
     'cost': 14.0},
    {'status': u'Purchase', 'phantom': False, 'row_no': 2, 'product_id': 25872, 'standard_price': 14.0, 'qty': 1.0,
     'cost': 14.0},
    {'status': u'Purchase', 'phantom': False, 'row_no': 3, 'product_id': 25875, 'standard_price': 14.0, 'qty': 1.0,
     'cost': 14.0},
    {'status': u'Purchase', 'phantom': False, 'row_no': 4, 'product_id': 25875, 'standard_price': 14.0, 'qty': 1.0,
     'cost': 14.0},
    {'status': u'Purchase', 'phantom': False, 'row_no': 5, 'product_id': 25875, 'standard_price': 14.0, 'qty': 1.0,
     'cost': 14.0}

]
import pandas as pd
df = pd.DataFrame(list)
print (df)
print (df.groupby('product_id', as_index=False)
         .agg({'status':'first','phantom':'first','row_no':'count','standard_price':'first','qty':'sum'})
         .to_dict(orient='records'))

this doesnot solves row_no problem still, so i will try.

Making same list with dicts but only with unique records

3 Answers3