Removing duplicate elements by their attributes in python

Question

Is there a nice way to remove elements from a list, by their attributes?

Example:

lis = [['element1', 12], ['element2', 2], ['element3', 12], ['element4', 36], ['element5', 12]]

And I want to get this list:

new_lis = [['element1', 12], ['element2', 2], ['element4', 36]]

I am looking for a short and elegant solution, maybe a module I am not familiar with?

you want to remove duplicates by the `[1]` item from the main list? — sobolevn, Jan 06 '16 at 10:28
What decides whether you want to keep `element1` or `element3` in your example? — poke, Jan 06 '16 at 10:28
Possible duplicate of [How do you remove duplicates from a list in Python whilst preserving order?](http://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-in-python-whilst-preserving-order) — Leonid Glanz, Jan 06 '16 at 10:30
The removal should be done by a sub element of the inner lists — NI6, Jan 06 '16 at 10:32

styvane · Answer 1 · 2016-01-09T21:00:05.550

The best way to do this is using simple generator function. The reason is that generator is lazy evaluated which means that it produces the item in the list on demand; saves a lot of memory for large list. You can then iterate the generator object and do something with the item

Demo:

>>> lis = [['element1', 12], ['element2', 2], ['element3', 12], ['element4', 36], ['element5', 12]]
>>> def deduplicate(items):
...     seen = set()
...     for item in items:
...         if not item[1] in seen:
...             seen.add(item[1])
...             yield item
... 
>>> deduplicate(lis)
<generator object deduplicate at 0x7fd454352e08>
>>> for item in deduplicate(lis):
...     print(item)
... 
['element1', 12]
['element2', 2]
['element4', 36]
>>> list(deduplicate(lis))
[['element1', 12], ['element2', 2], ['element4', 36]]

Why is the best way a generator when OP wants a list? Honest question. — timgeb, Jan 06 '16 at 10:42

timgeb · Answer 2 · 2016-01-06T10:54:21.020

Write a function for this:

def remove_duplicates_n(lis, n):
    'returns new list with items from lis and duplicates at position n removed, keeps order'
    seen = set()
    result = []
    for item in lis:
        if item[n] not in seen:
            result.append(item)
            seen.add(item[n])
    return result

For your desired result, call remove_duplicates_n(lis, 1).

Bonus: if you want to go to the dark side of sideeffects...

>>> seen = set()
>>> [x for x in lis if x[1] not in seen and not seen.add(x[1])]
[['element1', 12], ['element2', 2], ['element4', 36]]

score 0 · Answer 3 · answered Oct 27 '22 at 12:15

My proposal for a one-liner

{key(elt): elt for elt in reversed(iterable)}.values()

The order of the iterable is not kept because of the reversed call, but without it the later duplicate elements would override the earliest ones. Might need to be adjusted depending on your constraints. Can be used like so, with the example given in the question:

from typing import Iterable, Callable, TypeVar
from operator import itemgetter

T = TypeVar("T")

def get_unique_elements(iterable: Iterable[T], key: Callable[[T], any]) -> Iterable[T]:
    """
    Returns all unique elements from an iterable,
    using the key function to establish unicity.
    Elements appearing first will have priority in case of duplicates
    """
    return {key(elt): elt for elt in reversed(iterable)}.values()

list(get_unique_elements(
    [
        ["element1", 12],
        ["element2", 2],
        ["element3", 12],
        ["element4", 36],
        ["element5", 12],
    ],
    key=itemgetter(1),
)

Out: [['element1', 12], ['element4', 36], ['element2', 2]]

Removing duplicate elements by their attributes in python

3 Answers3