1

I am appending items into a list, and I do not want copies.

empty_list = []
empty_list.append('some_item')

I would like to check whether an exact "copy" of the item already exists in the list. If so, I would like to the item NOT to append.

I think one should write an if statement to check whether the item already exists in the list. If so, do not append.

if 'some_item' not in empty_list:
    empty_list.append('some_item')
else:
    pass

Is there a Python method/function which does this?

EDIT: This is a duplicate question, it appears. However, the answers provided below seem better than the previous question's.

ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234
  • 1
    I wonder why doesn't the code _you've already written_ suit your needs? You can write _your own_ function that would be effective enough. – ForceBru Jan 13 '16 at 18:33
  • Does the order of your list matter? Are the items hashable? If the answers to these questions are "No" and "Yes", then you should consider using a `set` instead of a list. You'll get de-duping for free. – mgilson Jan 13 '16 at 18:34
  • 2
    This smells duplicate. – erip Jan 13 '16 at 18:36
  • That `else: pass` is redundant – Eli Korvigo Jan 13 '16 at 18:36
  • The problem with the code you suggested - checking whether the item is in the list using `not in` - is that it requires the whole list has to be examined for each appended item. For short lists this might work just fine, but if you wanted to, say append 10000 items to an empty list, it would require 1 + 2 + ... + 10,000 comparisons. That's 50,000,000 comparison operations! In general, inserting N items to an empty list will require `(N^2)/2` comparisons. – Adam Brown Jan 13 '16 at 19:00

3 Answers3

4

To do it efficiently use a set:

seen = set()
L = []
if 'some_item' not in seen:
    L.append('some_item')
    seen.add('some_item')
Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
2

If you have to check using if 'some_item' not in my_list then it is an O(n) search of the entire list every time. If the items are not necessarily hashable, this then is probably still the most pythonic way to do it.

You can use a set as the other answer suggests, but it's a bit annoying to maintain the two collections side by side like that.

Some people use an OrderedDict as a data structure which behaves like a list without duplicates (you just use null values). With this method, you don't bother to check if an item is in there or not, you can simply assign it anyway and you will not get any duplicates.

The dict itself will behave the same as a list for iteration, and membership tests, and if you need an actual list you can always create one with list(odict).

output = list(OrderedDict.fromkeys(input_iterable))
wim
  • 338,267
  • 99
  • 616
  • 750
1

The direct answer is to use a set, which automatically ignore duplicates.

my_set = set()
...
# iterate over your collection of 'some_item's, adding each one
    my_set.add(some_item)

# Finally, if you need the items in a list, rather than a set:
my_list = list(my_set)
Prune
  • 76,765
  • 14
  • 60
  • 81