4

I have a list of dictionaries. Each dictionary has several key-values, and a single arbitrary (but important) key-value pair. For example

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"}
]

I would like to remove the duplicate dictionaries such that only the non- "ignore-key" values are ignored. I have seen a related question on so - but it only considers entirely identical dicts. Is there a way to remove the almost duplicate such that the data above becomes

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"}
]

It doesn't matter which of the duplicates is ignored. How can I do this?

Community
  • 1
  • 1

7 Answers7

5

Keep a set of the seen values for key and remove any dict that has the the same value:

st = set()

for d in thelist[:]:
    vals = d["key"],d["k2"]
    if vals in st:
        thelist.remove(d)
    st.add(vals)
print(thelist)

[{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'},
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]

If the values are always grouped, you can use the value from key to group and get the first dict from each group:

from itertools import groupby
from operator import itemgetter
thelist[:] = [next(v) for _, v in groupby(thelist,itemgetter("key","k2"))]
print(thelist)]

print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'}, 
{'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]

Or using a generator similar to DSM's answer to modify the original list without copying:

def filt(l):
    st = set()
    for d in l:
        vals = d["key"],d["k2"]
        if vals not in st:
            yield d
        st.add(vals)


thelist[:] = filt(thelist)

print(thelist)

 [{'k2': 'va1', 'ignore_key': 'arb1', 'key': 'value1'}, 
{'k2': 'va2', 'ignore_key': 'arb11', 'key': 'value2'}]

If you don't care which dupe is removes just use reversed:

st = set()

for d in reversed(thelist):
    vals = d["key"],d["k2"]
    if vals in st:
        thelist.remove(d)
    st.add(vals)
print(thelist)

To ignore all bar the ignore_key using groupby:

from itertools import groupby

thelist[:] = [next(v) for _, v in groupby(thelist, lambda d: 
                [val for k, val in d.items() if k != "ignore_key"])]
print(thelist)
[{'key': 'value1', 'k2': 'va1', 'ignore_key': 'arb1'},
 {'key': 'value2', 'k2': 'va2', 'ignore_key': 'arb11'}]
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
3

You could cram things into a line or two, but I think it's cleaner just to write a function:

def f(seq, ignore_keys):
    seen = set()
    for elem in seq:
        index = frozenset((k,v) for k,v in elem.items() if k not in ignore_keys)
        if index not in seen:
            yield elem
            seen.add(index)

which gives

>>> list(f(thelist, ["ignore_key"]))
[{'ignore_key': 'arb1', 'k2': 'va1', 'key': 'value1'}, 
 {'ignore_key': 'arb11', 'k2': 'va2', 'key': 'value2'}]

This assumes your values are hashable. (If they're not, the same code will work with seen = [] and seen.append(index), although it'll have bad performance for long lists.)

DSM
  • 342,061
  • 65
  • 592
  • 494
1

Starting off with your original list:

thelist = [
    {"key" : "value1", "ignore_key" : "arb1"}, 
    {"key" : "value2", "ignore_key" : "arb11"},
    {"key" : "value2", "ignore_key" : "arb113"}
]

Create a set, and populate it while filtering the list.

uniques, theNewList = set(), []
for d in theList:]
    cur = d["key"] # Avoid multiple lookups of the same thing
    if cur not in uniques:
        theNewList.append(d)
    uniques.add(cur)

Finally, rename the list:

theList = theNewList
Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
0

Instead of using a list of dicts you could use a dict of dicts. The key value on each one of your dict would be the key on the main dict.

Like this:

thedict = {}

thedict["value1"] = {"ignore_key" : "arb1", ...}  
thedict["value2"] = {"ignore_key" : "arb11", ...}

Since the dict wouldn't allow duplicate keys your problem wouldn't exist.

patricia
  • 1,075
  • 1
  • 16
  • 44
0

Without changing thelist

result = []
seen = set()
thelist = [
    {"key" : "value1", "ignore_key" : "arb1"},
    {"key" : "value2", "ignore_key" : "arb11"},
    {"key" : "value2", "ignore_key" : "arb113"}
]

for item in thelist:
    if item['key'] not in seen:
        result.append(item)
        seen.add(item['key'])

print(result)
f43d65
  • 2,264
  • 11
  • 15
0

Create a set of the unique values and check against (& update) that:

values = {d['key'] for d in thelist}
newlist = []

for d in thelist:
    if d['key'] in values:
        newlist.append(d)
        values -= {d['key']}

thelist = newlist
Brian Lee
  • 158
  • 4
0

You can adapt the accepted answer to the linked question by using a dictionary instead of a set to remove duplicates.

The following first builds a temporary dictionary whose keys are a tuple of the items in each dictionary in thelist except for ignored one which is saved as the value associated with each of these keys. Doing so eliminates duplicates since they will become the same key, yet preserves the ignored key and its ignored value (of the last or only one seen).

The second step recreates thelist by creating dictionaries composed of a combination of each key plus its associated value from the items in the temporary dictionary.

You could combine these two steps into a completely unreadable one-liner if you wished...

thelist = [
    {"key" : "value1", "k2" : "va1", "ignore_key" : "arb1"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb11"},
    {"key" : "value2", "k2" : "va2", "ignore_key" : "arb113"}
]

IGNORED = "ignore_key"
temp = dict((tuple(item for item in d.items() if item[0] != IGNORED),
             (IGNORED, d.get(IGNORED))) for d in thelist)
thelist = [dict(key + (value,)) for key, value in temp.iteritems()]

for item in thelist:
    print item

Output:

{'ignore_key': 'arb1', 'k2': 'va1', 'key': 'value1'}
{'ignore_key': 'arb113', 'k2': 'va2', 'key': 'value2'}
Community
  • 1
  • 1
martineau
  • 119,623
  • 25
  • 170
  • 301