4

I have a list of dictionaries in python as follows:

[{'category': 'software', 'name': 'irssi', 'version': '1.2.0'},
 {'category': 'software', 'name': 'irssi', 'version': '1.1.2'},
 {'category': 'software', 'name': 'hexchat', 'version': '2.14.2'}]

(parsing some data txt file)

What I wanna do:

If category and name are the same I wanna leave the first appearance of a package entry and remove the rest, so the final output would look like:

[{'category': 'software', 'name': 'irssi', 'version': '1.2.0'},
{'category': 'software', 'name': 'hexchat', 'version': '2.14.2'}]

How should I achieve this? I tried converting the list of dictionaries to a dictionary and then iterate over it with dict.items() but with no luck.

jukebox
  • 453
  • 2
  • 8
  • 24
Lorem
  • 39
  • 1
  • 3
  • 1
    Please take a look into https://stackoverflow.com/questions/9427163/remove-duplicate-dict-in-list-in-python – Dharmesh Fumakiya Mar 23 '19 at 18:55
  • @Dharmesh These aren't really duplicates, though. Only 2 key/value pairs are the same; the `version` is different. – Aran-Fey Mar 23 '19 at 18:56
  • Possible duplicate of [Remove duplicate dict in list in Python](https://stackoverflow.com/questions/9427163/remove-duplicate-dict-in-list-in-python) – Pedro Rodrigues Mar 23 '19 at 18:58

3 Answers3

8

Use a set to keep track of all (category, name) pairs you've already seen:

lst = [
    {'category': 'software', 'name': 'irssi', 'version': '1.2.0'},
    {'category': 'software', 'name': 'irssi', 'version': '1.1.2'},
    {'category': 'software', 'name': 'hexchat', 'version': '2.14.2'}
]

seen = set()
result = []

for dic in lst:
    key = (dic['category'], dic['name'])
    if key in seen:
        continue

    result.append(dic)
    seen.add(key)

print(result)
# output: [{'category': 'software', 'name': 'irssi', 'version': '1.2.0'},
#          {'category': 'software', 'name': 'hexchat', 'version': '2.14.2'}]

This can be generalized into a function:

def keep_first(iterable, key=None):
    if key is None:
        key = lambda x: x

    seen = set()
    for elem in iterable:
        k = key(elem)
        if k in seen:
            continue

        yield elem
        seen.add(k)
>>> list(keep_first(lst, lambda d: (d['category'], d['name'])))
[{'category': 'software', 'name': 'irssi', 'version': '1.2.0'},
 {'category': 'software', 'name': 'hexchat', 'version': '2.14.2'}]
Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
4

Use itertools.groupby, and take first of each group:

def uniq(lst):
    for _, grp in itertools.groupby(lst, lambda d: (d['category'], d['name'])):
        yield list(grp)[0]
lst = [{'category': 'software', 'name': 'irssi', 'version': '1.2.0'},
       {'category': 'software', 'name': 'irssi', 'version': '1.1.2'},
       {'category': 'software', 'name': 'hexchat', 'version': '2.14.2'}]
print(list(uniq(lst))
adrtam
  • 6,991
  • 2
  • 12
  • 27
  • This only works if the input list is already sorted by category and name, though. If you move `hexchat` inbetween the two `irssi`, this won't work. – Aran-Fey Mar 23 '19 at 18:57
  • right. assumed it sorted, as in the example above. Otherwise you just need to sort it before pass into function. – adrtam Mar 23 '19 at 18:58
  • You don't have to consume all the group by creating a list. You can just `yield next(grp)`. – Asocia Jan 27 '21 at 15:58
0

Assuming you want to keep the last version, you could create a dictionary that holds the software info for the last version of each category/name pair. Then get the list of values from that dictionary of dictionaries:

software = [{'category': 'software', 'name': 'irssi', 'version': '1.2.0'},
            {'category': 'software', 'name': 'irssi', 'version': '1.1.2'},
            {'category': 'software', 'name': 'hexchat', 'version': '2.14.2'}]

lastVersion = dict()
for softInfo in software:
    key = (softInfo['category'],softInfo['name'])
    if key not in lastVersion or lastVersion[key]['version'] < softInfo['version']:
        lastVersion[key] = softInfo
software = list(lastVersion.values())

print(software)

# [{'category': 'software', 'name': 'irssi', 'version': '1.2.0'},
#  {'category': 'software', 'name': 'hexchat', 'version': '2.14.2'}]
Alain T.
  • 40,517
  • 4
  • 31
  • 51