Finding the first duplicate of an array

Question

I've decided to learn python and I use CodeFight to train. The first Interview Practice exercice is about finding the first duplicate of an array and returning it or retruning -1 if there isn't any. This is the code I wrote:

def firstDuplicate(a):
b=[]
print(len(a))
for i in range(len(a)):
    if a[i] in b:
        return(a[i])
    elif a[i] not in b and i == (len(a)-1):
        return(-1)
    else:
        b.append(a[i])

I pass all the tests except the last 3. I says that my program takes longer than 4000ms to execute. I guess the array is very long and the duplicate is at the end. How can I reduce this execution time ? Thanks in advance.

The check `a[i] not in b` is not necessary since you are already in the else branch of `if a[i] in b` — Harald Gliebe, Oct 01 '17 at 14:31
Also, the most obvious optimisation would be to change `b` to a `set`. — cs95, Oct 01 '17 at 14:31
It's faster to use list comprehension instead of a `for` loop. That would change the whole code. — GIZ, Oct 01 '17 at 14:32
There are a lot of things that could be done to speed this up. One is to use a more efficient check to see if you're encountered a value before (i.e., a set). Another is to avoid pointless duplication of effort (this is just common sense). Check for a duplicate *once* per iteration, don't check a second time once you've already ruled it out, doubling the runtime. And stop making redundant end-of-list checks. Just let the loop exit normally, and return -1 *outside* of the loop. This also fixes your bug with empty lists, for which you will return None instead of -1. — Tom Karzes, Oct 01 '17 at 14:36

cs95 · Accepted Answer · 2018-04-05T00:03:22.640

8

Your current solution uses a list to do book-keeping, the in membership test is always going to be linear in time. You should instead consider using a set, or keeping counts of values in a Counter.

The idea in both cases is not to iterate through the complete list if not necessary. With a little extra space, you improve performance.

def firstDuplicateSet(a):
    seen = set()
    for i in a:
        if i in seen:
            return i
        seen.add(i)

    return -1

from collections import Counter

def firstDuplicateCounter(a):
    c = Counter()
    for i in a:
        c[i] += 1
        if c[i] > 1:
            return i

    return -1

edited Apr 05 '18 at 00:03

answered Oct 01 '17 at 14:35

cs95

379,657
97
704
746

Ok, thanks a lot, I 'm going to look at the wiki on python.org for sets as I never seen them yet. I'm sorry I didn't post it on the Code Review and I'll make sure I do so in the future. – Pierre Trott Oct 01 '17 at 14:37

score 3 · Answer 2 · answered Oct 01 '17 at 14:57

You could rewrite it this way:

def opt_first_duplicate(arr):
    unique_nb = set()
    for nb in arr:
        if nb in unique_nb:
            return nb
        else: 
            unique_nb.add(nb)
    return -1

I compared our solutions using %%timeit magic command in a jupyter notebook with a test array generated this way:

import random
test_array = [random.randint(0, 10000000) for i in range(5000)]

Example of one of the run:

firstDuplicate : 401 ms ± 1.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

opt_first_duplicate: 600 µs ± 20.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

2 tricks that optimized your code:

Using a set instead of a list to look for already seen integers
The second test (elif) is useless since you already tested the presence of the integer

Hope it solves your problem

On sets faster than lists: What makes sets faster than lists in python?

score 0 · Answer 3 · answered May 30 '19 at 06:11

0

from collections import Counter
list1 = [1,2,2,3,4,4]
dict1 = Counter(list1)
duplist = [key for (key,value) in dict1.items()if value >1]
for item in list1:
    if item in duplist:
        print ('first dup',item )
        break

answered May 30 '19 at 06:11

Selva

1

score 0 · Answer 4 · answered Jul 14 '21 at 14:51

This is not the exact answer for the question, but this may help many developers landing here. Here list comprehension and enumerate are made use of.

mylist=[1,2,3,4,5,1,6,7,8,9,5]
length=len(mylist)

all_duplicates=[v for i,v in enumerate(mylist) if i < length-1 and v in mylist[i+1:]]
[1, 5]

first_duplicate=next((v for i,v in enumerate(mylist) if i < length-1 and v in mylist[i+1:]),None)
1

I hope the code explains itself

Finding the first duplicate of an array

4 Answers4

Linked