How do I remove a duplicate dict in list, ignoring a dict key?

Question

I have a list of dictionaries. Each dictionary has several key-values, and a single arbitrary (but important) key-value pair. For example

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"}
]

I would like to remove the duplicate dictionaries such that only the non- "ignore-key" values are ignored. I have seen a related question on so - but it only considers entirely identical dicts. Is there a way to remove the almost duplicate such that the data above becomes

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"}
]

It doesn't matter which of the duplicates is ignored. How can I do this?

Thank you for your responses. Sorry - the example situation wasn't clear. There are multiple key-value pairs and only one key to ignore. — user4467853, May 17 '15 at 14:38
@DSM Yes, the values are always hashable (text and datetime objects). — user4467853, May 17 '15 at 14:39
@user4467853, I presumed they are grouped or how do you decide which to keep? — Padraic Cunningham, May 17 '15 at 14:44
Somewhat overkill but you could do no_ignore = a_dict.copy(), del no_ignore ["ignore_key"]. Then apply Ami's add as you filter recipe to add a_dict while filtering on no_ignore. — JL Peyret, May 17 '15 at 14:45
Also are there the same keys in each dict and can different keys have the same values? — Padraic Cunningham, May 17 '15 at 15:04

Padraic Cunningham · Answer 1 · 2015-05-17T15:19:53.223

Keep a set of the seen values for key and remove any dict that has the the same value:

st = set()

for d in thelist[:]:
    vals = d["key"],d["k2"]
    if vals in st:
        thelist.remove(d)
    st.add(vals)
print(thelist)

[{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'},
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]

If the values are always grouped, you can use the value from key to group and get the first dict from each group:

from itertools import groupby
from operator import itemgetter
thelist[:] = [next(v) for _, v in groupby(thelist,itemgetter("key","k2"))]
print(thelist)]

print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'}, 
{'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]

Or using a generator similar to DSM's answer to modify the original list without copying:

def filt(l):
    st = set()
    for d in l:
        vals = d["key"],d["k2"]
        if vals not in st:
            yield d
        st.add(vals)


thelist[:] = filt(thelist)

print(thelist)

 [{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'}, 
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]

If you don't care which dupe is removes just use reversed:

st = set()

for d in reversed(thelist):
    vals = d["key"],d["k2"]
    if vals in st:
        thelist.remove(d)
    st.add(vals)
print(thelist)

To ignore all bar the ignore_key using groupby:

from itertools import groupby

thelist[:] = [next(v) for _, v in groupby(thelist, lambda d: 
                [val for k, val in d.items() if k != "ignore_key"])]
print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'},
 {'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]

score 3 · Answer 2 · answered May 17 '15 at 14:39

You could cram things into a line or two, but I think it's cleaner just to write a function:

def f(seq, ignore_keys):
    seen = set()
    for elem in seq:
        index = frozenset((k,v) for k,v in elem.items() if k not in ignore_keys)
        if index not in seen:
            yield elem
            seen.add(index)

which gives

>>> list(f(thelist, ["ignore_key"]))
[{'ignore_key': 'arb1', 'k2': 'va1', 'key': 'value1'}, 
 {'ignore_key': 'arb11', 'k2': 'va2', 'key': 'value2'}]

This assumes your values are hashable. (If they're not, the same code will work with seen = [] and seen.append(index), although it'll have bad performance for long lists.)

score 1 · Answer 3 · answered May 17 '15 at 14:33

Starting off with your original list:

thelist = [
    {"key" : "value1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "ignore_key" : "arb11"},
    {"key" : "value2", "ignore_key" : "arb113"}
]

Create a set, and populate it while filtering the list.

uniques, theNewList = set(), []
for d in theList:]
    cur = d["key"] # Avoid multiple lookups of the same thing
    if cur not in uniques:
        theNewList.append(d)
    uniques.add(cur)

Finally, rename the list:

theList = theNewList

score 0 · Answer 4 · answered May 17 '15 at 14:29

Instead of using a list of dicts you could use a dict of dicts. The key value on each one of your dict would be the key on the main dict.

Like this:

thedict = {}

thedict["value1"] = {"ignore_key" : "arb1", ...}  
thedict["value2"] = {"ignore_key" : "arb11", ...}

Since the dict wouldn't allow duplicate keys your problem wouldn't exist.

score 0 · Answer 5 · answered May 17 '15 at 14:42

Without changing thelist

result = []
seen = set()
thelist = [
    {"key" : "value1", "ignore_key" : "arb1"},
    {"key" : "value2", "ignore_key" : "arb11"},
    {"key" : "value2", "ignore_key" : "arb113"}
]

for item in thelist:
    if item['key'] not in seen:
        result.append(item)
        seen.add(item['key'])

print(result)

score 0 · Answer 6 · answered May 17 '15 at 14:56

0

Create a set of the unique values and check against (& update) that:

values = {d['key'] for d in thelist}
newlist = []

for d in thelist:
    if d['key'] in values:
        newlist.append(d)
        values -= {d['key']}

thelist = newlist

answered May 17 '15 at 14:56

Brian Lee

158
4

score 0 · Answer 7 · edited May 23 '17 at 12:18

You can adapt the accepted answer to the linked question by using a dictionary instead of a set to remove duplicates.

The following first builds a temporary dictionary whose keys are a tuple of the items in each dictionary in thelist except for ignored one which is saved as the value associated with each of these keys. Doing so eliminates duplicates since they will become the same key, yet preserves the ignored key and its ignored value (of the last or only one seen).

The second step recreates thelist by creating dictionaries composed of a combination of each key plus its associated value from the items in the temporary dictionary.

You could combine these two steps into a completely unreadable one-liner if you wished...

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"}
]

IGNORED = "ignore_key"
temp = dict((tuple(item for item in d.items() if item[0] != IGNORED),
             (IGNORED, d.get(IGNORED))) for d in thelist)
thelist = [dict(key + (value,)) for key, value in temp.iteritems()]

for item in thelist:
    print item

Output:

{'ignore_key': 'arb1', 'k2': 'va1', 'key': 'value1'}
{'ignore_key': 'arb113', 'k2': 'va2', 'key': 'value2'}

How do I remove a duplicate dict in list, ignoring a dict key?

7 Answers7