How to remove a repeated dictionary in a list of dictionaries?

Question

I have a list in Python:

[{u'key': u'Central District', u'doc_count': 21468},
 {u'key': u'Central District', u'doc_count': 6190},
 {u'key': u'Central District', u'doc_count': 2060},
 {u'key': u'Mexico', u'doc_count': 1884}]

but I need to turn it into this:

[{u'key': u'Central District', u'doc_count':  29718},
 {u'key': u'Mexico', u'doc_count': 1884}]

How can I eliminate one of the repeated elements (in this case "Central District"), and get the sum of the doc_count values of each "Central District"?

You aren't *eliminating* "repeat elements"; you are *combining* data for elements with the same key. If you can write code to identify the "repeat elements", you should be able to accomplish what you want. — Scott Hunter, Nov 15 '15 at 23:08
or http://stackoverflow.com/questions/19882641/sum-value-of-two-different-dictionaries-which-is-having-same-key — sobolevn, Nov 15 '15 at 23:26

Patrick Steadman · Answer 1 · 2015-11-15T23:39:03.713

Itertools and reduce can help sum the values grouped by key.

from itertools import groupby

original = [{u'key': u'Central District', u'doc_count': 21468},
            {u'key': u'Central District', u'doc_count': 6190},
            {u'key': u'Central District', u'doc_count': 2060},
            {u'key': u'Mexico', u'doc_count': 1884}]

def sum_reduce(obj1, obj2):
    return {'key': obj1['key'], 'doc_count': obj1['doc_count'] + obj2['doc_count']}

combined = [reduce(sum_reduce, group) for _, group in groupby(original, lambda x: x['key'])]

print combined 
# output: 
# [{'key': u'Central District', 'doc_count': 29718}, {u'key': u'Mexico',  u'doc_count': 1884}]

score 0 · Answer 2 · answered Nov 15 '15 at 23:22

I don't know why you are using such bad data-structure.

Here's what I would do:

old_data = [{u'key': u'Central District', u'doc_count': 21468},
 {u'key': u'Central District', u'doc_count': 6190},
 {u'key': u'Central District', u'doc_count': 2060},
 {u'key': u'Mexico', u'doc_count': 1884}]

#  STORE DATA AS key:value of -: Location: Doc count
new_data = {}

for values in old_data:
    if values['key'] not in new_data:
        new_data[values['key']] = values['doc_count']
    else:
        new_data[values['key']] += values['doc_count']

print(new_data)

Outputs:

{u'Central District': 29718, u'Mexico': 1884}

The purpose of dictionary is to nest similar data and use keys to access it. Your keys are literally 'key', and you are using a list to store dicts, which is madness.

In my example you can easily access the keys using 'Mexico' or' 'Central District', and the returned value will be the doc count!

NotAnAmbiTurner · Answer 3 · 2015-11-15T23:16:06.713

list_of_dicts = [{u'key': u'Central District', u'doc_count': 21468},
                 {u'key': u'Central District', u'doc_count': 6190},
                 {u'key': u'Central District', u'doc_count': 2060},
                 {u'key': u'Mexico', u'doc_count': 1884}]

def do_stuff(list_of_dicts):
    TO_COUNT = u'Central District'
    to_count_sum = 0
    res_list = []
    for dictry in list_of_dicts:
        if dictry["key"] == to_count:
            to_count_sum += dictry[u'doc_count']
        else:
            res_list.append(dictry)
    dicty = {u'key': to_count,
             u'doc_count': to_count_sum}
    res_list.append(dicty)
    return res_list

assert do_stuff(list_of_dicts) == [{'key': 'Mexico', 'doc_count': 1884}, {'key': 'Central District', 'doc_count': 29718}]

OP didn't say it had to. It's also easy enough to do. I'm assuming for now that he just wants to sum the elements of certain dictionaries then write that back to a DB. — NotAnAmbiTurner, Nov 15 '15 at 23:17

How to remove a repeated dictionary in a list of dictionaries?

3 Answers3