In Python, how do I find duplicates in a sequence and collect them in a list?

Question

I was not allowed to answer How do I find the duplicates in a list and create another list with them? but I think my solution is worth it. So, I will generalize the question and be glad about feedback.

How to answer the question for this list:

a = [1, 2, 1, 1, 2, 3, ]

score 0 · Answer 1 · answered Jun 17 '21 at 19:45

0

very simple:

have = []
duplicates = []
for item in a:
    if item not in have:
        have.append(item)
    else:
        if item not in duplicates:
            duplicates append(item)

the condition after else is just to be shure that the list duplicates get just one time the duplicated item

answered Jun 17 '21 at 19:45

Martino

1
3

These `not in` queries are inefficient for lists. This is going to be on the order of `O(n^2)`. – Bungo Jun 17 '21 at 19:48
@Bungo how does it apply in this case? I thought `in` was rather efficient, so why shouldn't `not in` be? – pylo Jun 17 '21 at 20:29
1

`in` and `not in` would be efficient when used with data structures such as sets or dictionaries, which are designed to make these queries fast. But a list is just a linear array. If the item is in fact not in the list, then `if item not in have` has to iterate through the entire list in order to make this determination. That's why [the top answer at the duplicate candidate](https://stackoverflow.com/a/9835819/4032910) uses a dictionary (`seen`) for this purpose. – Bungo Jun 17 '21 at 20:46

score 0 · Answer 2 · answered Jun 17 '21 at 19:47

I'm not sure what your question is exactly. But if you want to compare two lists and place the similarities into one list you can do something like this:

a = [1, 2, 1, 1, 2, 3]
b = [1, 2, 5, 6, 1, 3]
c = []

for item in a:
    if item in b:
        c.append(item)

pylo · Answer 3 · 2022-04-29T21:02:40.500

The accepted answer in the original question is best, as I learned. I missed the fact that any look-up in a set or dictionary is only Order(1). Sets and dicts address their elements via hashes by which the location in memory is directly known - very neat. (NB: calling set() upon a sequence of hashable/immutable elements removes any duplicates and the set is most efficiently created: Order(n) for n items.)

I demonstrate by searching for the last item in a list compared to searching for any item in a set:

>>> import timeit, random
>>> rng = range(100999)
>>> myset = set(rng)
>>> mylist = list(rng)
>>> # How long does it take to test the list for its last value
>>> # compared to testing for a value in a set?
>>> timeit.timeit(lambda: mylist[-1] in mylist, number=9999)
12.907866499997908
>>> timeit.timeit(lambda: random.choice(rng) in myset, number=9999)
0.012736899996525608
>>>

In Python, how do I find duplicates in a sequence and collect them in a list?

3 Answers3