Index of the first duplicate element

Question

I need help with finding the index of the first duplicate element in array. I tried this but it gets me all indices of all duplicate elements.

res = [idx for idx, item in enumerate(input) if item in input[:idx]]

Please provide more information about your problem with code — Teshie Ethiopia, Oct 13 '22 at 16:46
@CarlHR Eliminating duplicates and finding the index of the first duplicate are related, but not the same. — CrazyChucky, Oct 13 '22 at 16:48
Sidenote: you're shadowing `input`. Once you define your own variable named that, you lose access to Python's builtin `input` (and could potentially encounter confusing bugs if you try to use it). — CrazyChucky, Oct 13 '22 at 16:50
When saying „the first duplicate element“ are you referring to the first duplicate for a specific number or for all different numbers that have duplicates in the array?. — ai2ys, Oct 13 '22 at 17:32

wim · Answer 1 · 2022-10-13T17:47:12.190

4

Unfortunately your idea is O(n^2), because slicing and checking membership is O(n) and you do this within a loop. It's essentially a nested loop.

Instead, you can do this in O(n) using a single loop:

seen = {}
for i, item in enumerate(my_input):
    if item in seen:
        break
    seen[item] = i
else:
    i = item = None

At the end of the loop, item is the first duplicate item and i is its index. seen[item] will hold the index of the first occurrence of this duplicate item.

edited Oct 13 '22 at 17:47

answered Oct 13 '22 at 16:57

wim

338,267
99
616
750

You can use a set instead: return the first index of something already in it – Mad Physicist Oct 13 '22 at 17:52
I added a couple of implementations – Mad Physicist Oct 13 '22 at 18:02
@MadPhysicist The [previous revision of this answer](https://stackoverflow.com/revisions/74059349/1) used a set. – wim Oct 13 '22 at 18:23

CrazyChucky · Answer 2 · 2022-10-15T23:02:54.290

tl;dr: use wim's or Mad Physicist's answers. Anything based on the technique you're starting with will do a lot of unnecessary looping. The following isn't so much a solution as some parallel ideas about how your current code could be improved.

The simplest modification to your code would be to simply access the first item of the resulting list:

res = [idx for idx, item in enumerate(values) if item in values[:idx]][0]

But that's kind of silly, since you'll still be traversing the entire list, finding all duplicates, then throwing away all but one. It will also throw a KeyError if the list is empty (that is, if there are no duplicates). A cleaner and more efficient method would be to call next on a version of your list comprehension turned into a generator expression:

res = next((idx for idx, item in enumerate(values) if item in values[:idx]), None)

None is supplied as a default which will be returned if no duplicates are found.

This still has the same efficiency issues that wim points out, but knowing about indexing, next, and generator expressions can be useful!

I figured out how to do it in O(N) if you're willing to initialize a set on a separate line — Mad Physicist, Oct 13 '22 at 18:04
I usually try to avoid side effects and `or` tricks like that because I come back in a month and have to puzzle out what I did, but that's clever! — CrazyChucky, Oct 13 '22 at 18:36

Mad Physicist · Answer 3 · 2022-10-13T18:02:18.670

You can maintain a set of items you've seen so far:

def first_dup(a):
    seen = set()
    for i, e in enumerate(a):
        if e in seen:
            return i
        seen.add(e)
    return -1

You can isolate the set check into a separate function and combine the rest into an efficient one-liner using next:

def first_dup(a):
    seen = set()
    return next((i for i, e in enumerate(a) if (e in seen or seen.add(e))), -1)

This cheats by using the fact that or will evaluate the second expression any time the first is False, and set.add is an in-place operation that returns None.

score 0 · Answer 4 · answered Oct 13 '22 at 16:54

0

If you want the first index in the iteration:

array=[2,3,4,5,3]
for i,a in enumerate(array):
    for j in range(i):
        if a==array[j]:
            print(j)
            exit()

If you want the last index in the iteration:

array=[2,3,4,5,3]
for i,a in enumerate(array):
    for j in range(i):
        if a==array[j]:
            print(i)
            exit()

answered Oct 13 '22 at 16:54

FatemeZamanian

154
5

1

This is also _O(n^2)_, suboptimal. – wim Oct 13 '22 at 17:00
Why use `exit()` rather than breaking out of the loop? – Chris Oct 15 '22 at 22:36
Because I wanted to exit all loops at once and I had no problem with closing the program. If you want, you can use breaks like this: https://stackoverflow.com/a/3150107/15123109 – FatemeZamanian Oct 15 '22 at 22:44

Index of the first duplicate element

4 Answers4