Deleting duplicate items in a list or dictionary

Question

I am appending items into a list, and I do not want copies.

empty_list = []
empty_list.append('some_item')

I would like to check whether an exact "copy" of the item already exists in the list. If so, I would like to the item NOT to append.

I think one should write an if statement to check whether the item already exists in the list. If so, do not append.

if 'some_item' not in empty_list:
    empty_list.append('some_item')
else:
    pass

Is there a Python method/function which does this?

EDIT: This is a duplicate question, it appears. However, the answers provided below seem better than the previous question's.

I wonder why doesn't the code _you've already written_ suit your needs? You can write _your own_ function that would be effective enough. — ForceBru, Jan 13 '16 at 18:33
Does the order of your list matter? Are the items hashable? If the answers to these questions are "No" and "Yes", then you should consider using a `set` instead of a list. You'll get de-duping for free. — mgilson, Jan 13 '16 at 18:34
The problem with the code you suggested - checking whether the item is in the list using `not in` - is that it requires the whole list has to be examined for each appended item. For short lists this might work just fine, but if you wanted to, say append 10000 items to an empty list, it would require 1 + 2 + ... + 10,000 comparisons. That's 50,000,000 comparison operations! In general, inserting N items to an empty list will require `(N^2)/2` comparisons. — Adam Brown, Jan 13 '16 at 19:00

score 4 · Accepted Answer · answered Jan 13 '16 at 18:32

4

To do it efficiently use a set:

seen = set()
L = []
if 'some_item' not in seen:
    L.append('some_item')
    seen.add('some_item')

answered Jan 13 '16 at 18:32

Eugene Yarmash

142,882
41
325
378

4

Which works so long as the items can be hashed. – mgilson Jan 13 '16 at 18:33
1

Depending on the circumstances, the list may not be needed at all. – Lev Levitsky Jan 13 '16 at 18:33
@LevLevitsky In order to retain ordering, I'd prefer a list – ShanZhengYang Jan 13 '16 at 18:48
Thanks! I think this is a great idea, though @mgilson is correct. This answer is why I think this question isn't "duplicate", as it's a unique solution (which I'm using). – ShanZhengYang Jan 13 '16 at 21:00

wim · Answer 2 · 2016-01-28T02:00:36.047

If you have to check using if 'some_item' not in my_list then it is an O(n) search of the entire list every time. If the items are not necessarily hashable, this then is probably still the most pythonic way to do it.

You can use a set as the other answer suggests, but it's a bit annoying to maintain the two collections side by side like that.

Some people use an OrderedDict as a data structure which behaves like a list without duplicates (you just use null values). With this method, you don't bother to check if an item is in there or not, you can simply assign it anyway and you will not get any duplicates.

The dict itself will behave the same as a list for iteration, and membership tests, and if you need an actual list you can always create one with list(odict).

output = list(OrderedDict.fromkeys(input_iterable))

score 1 · Answer 3 · answered Jan 13 '16 at 18:34

1

The direct answer is to use a set, which automatically ignore duplicates.

my_set = set()
...
# iterate over your collection of 'some_item's, adding each one
    my_set.add(some_item)

# Finally, if you need the items in a list, rather than a set:
my_list = list(my_set)

answered Jan 13 '16 at 18:34

Prune

76,765
14
60
81

1

This destroys ordering, though. – wim Jan 13 '16 at 18:41

Deleting duplicate items in a list or dictionary

3 Answers3