2

Is there a way to find if a list contains duplicates. For example:

list1 = [1,2,3,4,5]
list2 = [1,1,2,3,4,5]

list1.*method* = False # no duplicates
list2.*method* = True # contains duplicates
David542
  • 104,438
  • 178
  • 489
  • 842
  • 1
    Is this assuming the lists are always sorted? – tyjkenn Jun 28 '12 at 17:25
  • Possible duplicate: http://stackoverflow.com/questions/1920145/how-to-find-duplicate-elements-in-array-using-for-loop-in-python-like-c-c – tyjkenn Jun 28 '12 at 17:27
  • 1
    @tyjkenn: Checking for existence of duplicates is simpler than finding the actual duplicates (which is what the other question is about). – interjay Jun 28 '12 at 17:30

4 Answers4

14

If you convert the list to a set temporarily, that will eliminate the duplicates in the set. You can then compare the lengths of the list and set.

In code, it would look like this:

list1 = [...]
tmpSet = set(list1)
haveDuplicates = len(list1) != len(tmpSet)
3Doubloons
  • 2,088
  • 14
  • 26
  • 2
    +1 for including some actual text to explain what you are doing as opposed to just plopping down code. – jdi Jun 28 '12 at 17:34
  • 1
    @jdi: I actually tried to just plop down some code originally but it came under the 30 characters minimum. – 3Doubloons Jun 28 '12 at 17:50
2

Convert the list to a set to remove duplicates. Compare the lengths of the original list and the set to see if any duplicates existed.

>>> list1 = [1,2,3,4,5]
>>> list2 = [1,1,2,3,4,5]
>>> len(list1) == len(set(list1))
True # no duplicates
>>> len(list2) == len(set(list2))
False # duplicates
FogleBird
  • 74,300
  • 25
  • 125
  • 131
2

Check if the length of the original list is larger than the length of the unique "set" of elements in the list. If so, there must have been duplicates

list1 = [1,2,3,4,5]
list2 = [1,1,2,3,4,5]

if len(list1) != len(set(list1)):
    #duplicates
Paul Seeb
  • 6,006
  • 3
  • 26
  • 38
0

The set() approach only works for hashable objects, so for completness, you could do it with just plain iteration:

import itertools

def has_duplicates(iterable):
    """
    >>> has_duplicates([1,2,3])
    False
    >>> has_duplicates([1, 2, 1])
    True
    >>> has_duplicates([[1,1], [3,2], [4,3]])
    False
    >>> has_duplicates([[1,1], [3,2], [4,3], [4,3]])
    True
    """
    return any(x == y for x, y in itertools.combinations(iterable, 2))
lqc
  • 7,434
  • 1
  • 25
  • 25
  • Ouch. This one hurts for complexity. Better to write hash functions for your unhashable objects. – Joel Cornett Jun 28 '12 at 17:58
  • @JoelCornett Mind doing it for ``list`` ? – lqc Jun 28 '12 at 18:07
  • `listHash = lambda x: hash(tuple(x))`. Note that since this hash is just a one-time thing, you don't have to worry about objects mutating on you. – Joel Cornett Jun 28 '12 at 20:58
  • Here's a simpler one: ``lambda x: 1``. Creating such a function doesn't make ``list`` objects any more hashable, 'cause ``list.__hash__`` is still ``None``. As for efficiency, you can easily tweak this to take constant extra memory. Hashing is just a CPU/memory tradeoff. – lqc Jun 29 '12 at 07:04