Python methods to find duplicates

Question

Is there a way to find if a list contains duplicates. For example:

list1 = [1,2,3,4,5]
list2 = [1,1,2,3,4,5]

list1.*method* = False # no duplicates
list2.*method* = True # contains duplicates

Possible duplicate: http://stackoverflow.com/questions/1920145/how-to-find-duplicate-elements-in-array-using-for-loop-in-python-like-c-c — tyjkenn, Jun 28 '12 at 17:27
@tyjkenn: Checking for existence of duplicates is simpler than finding the actual duplicates (which is what the other question is about). — interjay, Jun 28 '12 at 17:30

score 14 · Accepted Answer · answered Jun 28 '12 at 17:27

14

If you convert the list to a set temporarily, that will eliminate the duplicates in the set. You can then compare the lengths of the list and set.

In code, it would look like this:

list1 = [...]
tmpSet = set(list1)
haveDuplicates = len(list1) != len(tmpSet)

answered Jun 28 '12 at 17:27

3Doubloons

2,088
14
26

2

+1 for including some actual text to explain what you are doing as opposed to just plopping down code. – jdi Jun 28 '12 at 17:34
1

@jdi: I actually tried to just plop down some code originally but it came under the 30 characters minimum. – 3Doubloons Jun 28 '12 at 17:50

FogleBird · Answer 2 · 2012-06-28T17:48:40.390

2

Convert the list to a set to remove duplicates. Compare the lengths of the original list and the set to see if any duplicates existed.

>>> list1 = [1,2,3,4,5]
>>> list2 = [1,1,2,3,4,5]
>>> len(list1) == len(set(list1))
True # no duplicates
>>> len(list2) == len(set(list2))
False # duplicates

edited Jun 28 '12 at 17:48

answered Jun 28 '12 at 17:27

FogleBird

74,300
25
125
131

Paul Seeb · Answer 3 · 2012-06-29T19:56:58.860

2

Check if the length of the original list is larger than the length of the unique "set" of elements in the list. If so, there must have been duplicates

list1 = [1,2,3,4,5]
list2 = [1,1,2,3,4,5]

if len(list1) != len(set(list1)):
    #duplicates

edited Jun 29 '12 at 19:56

answered Jun 28 '12 at 17:28

Paul Seeb

6,006
3
26
38

score 0 · Answer 4 · answered Jun 28 '12 at 17:43

0

The set() approach only works for hashable objects, so for completness, you could do it with just plain iteration:

import itertools

def has_duplicates(iterable):
    """
    >>> has_duplicates([1,2,3])
    False
    >>> has_duplicates([1, 2, 1])
    True
    >>> has_duplicates([[1,1], [3,2], [4,3]])
    False
    >>> has_duplicates([[1,1], [3,2], [4,3], [4,3]])
    True
    """
    return any(x == y for x, y in itertools.combinations(iterable, 2))

answered Jun 28 '12 at 17:43

lqc

7,434
1
25
25

Ouch. This one hurts for complexity. Better to write hash functions for your unhashable objects. – Joel Cornett Jun 28 '12 at 17:58
@JoelCornett Mind doing it for ``list`` ? – lqc Jun 28 '12 at 18:07
`listHash = lambda x: hash(tuple(x))`. Note that since this hash is just a one-time thing, you don't have to worry about objects mutating on you. – Joel Cornett Jun 28 '12 at 20:58
Here's a simpler one: ``lambda x: 1``. Creating such a function doesn't make ``list`` objects any more hashable, 'cause ``list.__hash__`` is still ``None``. As for efficiency, you can easily tweak this to take constant extra memory. Hashing is just a CPU/memory tradeoff. – lqc Jun 29 '12 at 07:04

Python methods to find duplicates

4 Answers4

Linked