Repetition inside of lists

Question

I want to build a function that will return True if any two items in a list are the same.

For example, [1,7,3,7,4] should return True and ["one","ONE","One"] should return False.

I need help with which parts of python look for duplicates.

Related: [Identify duplicate values in a list in Python](http://stackoverflow.com/q/11236006) — Martijn Pieters, Feb 23 '15 at 16:27

score 2 · Answer 1 · answered Feb 23 '15 at 16:26

2

Loop over the values and use a set to track what you have already seen. As soon as you see a value again, return True:

def has_duplicates(lst):
    seen = set()
    for elem in lst:
        if elem in seen:
            return True
        seen.add(elem)
    return False

This is very efficient in that it short-circuits; it won't loop over the whole list if a duplicate has been detected early on.

answered Feb 23 '15 at 16:26

Martijn Pieters

1,048,767
296
4,058
3,343

Wouldn't a list be faster? (`seen = []`) I did a quick `timeit` (`timeit.timeit("s.add(5)", "s = set()")` was `0.100...` and `timeit.timeit("s.append(5)", "s = []")` was `0.0911`) – Feb 23 '15 at 16:30
Why not just do: `if len(set(lst)) != len(lst)` – Gillespie Feb 23 '15 at 16:30
@RPGillespie Because that has to iterate over the whole list. – Feb 23 '15 at 16:31
@Reticality because `set(lst)` iterates over the whole list to produce the set. No, using a list for `seen` is going to be *slower*. Don't compare appending speed of sets and lists here, it is *membership testing* that'll kill you, as it is O(N) for lists vs. O(1) for sets. The larger the number of unique values the slower testing against the list is going to be. – Martijn Pieters Feb 23 '15 at 16:33

score 1 · Answer 2 · edited May 23 '17 at 12:05

1

Martijn's answer is the best, but with a few exceptions, this is worth a try.

>>> chk = lambda x: len(l) != len(set(l)) # check the length after removing dupes. 
>>> l = [1,7,3,7,4] 
>>> chk(l)
True
>>> l = ["one","ONE","One"]
>>> chk(l)
False

Note - As Martijn mentions in a comment, this is a slower process.

edited May 23 '17 at 12:05

Community

1
1

answered Feb 23 '15 at 16:27

Bhargav Rao

50,140
28
121
140

1

This requires iteration over the *whole* list. Try this out with `[1] + list(range(1000000))` for example. – Martijn Pieters Feb 23 '15 at 16:28
@MartijnPieters Yep, It is a very slow way to do!. :) (But hopefully not a *wrong* way) – Bhargav Rao Feb 23 '15 at 16:29

Padraic Cunningham · Answer 3 · 2015-02-23T16:37:33.370

1

Using a collections.Counter dict:

from collections import Counter
def has_dupes(l):
    # if most repeated key count is > 1 we have at least one dupe
    return Counter(l).most_common(1)[0][1] > 1

Or use any:

def has_dupes(l):
    return any(v > 1 for v in Counter(l).values())

edited Feb 23 '15 at 16:37

answered Feb 23 '15 at 16:32

Padraic Cunningham

176,452
29
245
321

1

This is slower still than using `len(set(l))` as it requires a O(N) loop over all the values, followed by creating a heap to find the most common element, again O(N). Asymptotically the same as `len(set(l))` but with a higher fixed cost per iteration. – Martijn Pieters Feb 23 '15 at 16:52
1

@MartijnPieters, yes, but a linear solution, another example of how to do what the OP wants and an introduction to a Counter dict which is a pretty useful tool. If the question was what is the optimal way because I have a huge amount of data then that would be different but it's not. – Padraic Cunningham Feb 23 '15 at 16:54
1

I'm merely commenting on the performance aspects of the solution, because I do think that matters. Otherwise this whole question would just be a dupe of [Identify duplicate values in a list in Python](http://stackoverflow.com/q/11236006) – Martijn Pieters Feb 23 '15 at 16:57
1

@MartijnPieters, fair enough but I don't think in this case performance is a concern at all. `has_dupes([1] + list(range(1000000)))` takes `249ms` ` – Padraic Cunningham Feb 23 '15 at 17:00
1

If it is no concern we may as well dupe this. But there is a *generic* use here for future visitors. Some of those may well be concerned with performance. – Martijn Pieters Feb 23 '15 at 17:05
1

@MartijnPieters. I am not disputing the extra work in mine I just don't think the OP is overly concerned with performance, I also don't think the title does either. – Padraic Cunningham Feb 23 '15 at 17:08
1

But all the solutions here *echo the other post*. My point being that if you *only* need to know *if* there is a duplicate, then you can use a single loop and exit early. Otherwise just use any of the techniques in the dupe and turn that into a boolean. – Martijn Pieters Feb 23 '15 at 17:10
1

@MartijnPieters, well all bar one of the related questions are not exactly great answers. There are n log n and quadratic solutions and the accepted answer use items instead of most_common or just the values, would you recommend `if mylist.count(x) > 1`? – Padraic Cunningham Feb 23 '15 at 17:13
`mylist.count(x) > 1` is terrible. I didn't say the options in that post were all that great. But the accepted answer is correct if you wanted to know *which* elements were duplicated. – Martijn Pieters Feb 23 '15 at 17:17

Repetition inside of lists

3 Answers3