3

I've decided to learn python and I use CodeFight to train. The first Interview Practice exercice is about finding the first duplicate of an array and returning it or retruning -1 if there isn't any. This is the code I wrote:

def firstDuplicate(a):
b=[]
print(len(a))
for i in range(len(a)):
    if a[i] in b:
        return(a[i])
    elif a[i] not in b and i == (len(a)-1):
        return(-1)
    else:
        b.append(a[i])

I pass all the tests except the last 3. I says that my program takes longer than 4000ms to execute. I guess the array is very long and the duplicate is at the end. How can I reduce this execution time ? Thanks in advance.

cs95
  • 379,657
  • 97
  • 704
  • 746
Pierre Trott
  • 31
  • 1
  • 5
  • 2
    The check `a[i] not in b` is not necessary since you are already in the else branch of `if a[i] in b` – Harald Gliebe Oct 01 '17 at 14:31
  • Also, the most obvious optimisation would be to change `b` to a `set`. – cs95 Oct 01 '17 at 14:31
  • It's faster to use list comprehension instead of a `for` loop. That would change the whole code. – GIZ Oct 01 '17 at 14:32
  • There are a lot of things that could be done to speed this up. One is to use a more efficient check to see if you're encountered a value before (i.e., a set). Another is to avoid pointless duplication of effort (this is just common sense). Check for a duplicate *once* per iteration, don't check a second time once you've already ruled it out, doubling the runtime. And stop making redundant end-of-list checks. Just let the loop exit normally, and return -1 *outside* of the loop. This also fixes your bug with empty lists, for which you will return None instead of -1. – Tom Karzes Oct 01 '17 at 14:36

4 Answers4

8

Your current solution uses a list to do book-keeping, the in membership test is always going to be linear in time. You should instead consider using a set, or keeping counts of values in a Counter.

The idea in both cases is not to iterate through the complete list if not necessary. With a little extra space, you improve performance.


def firstDuplicateSet(a):
    seen = set()
    for i in a:
        if i in seen:
            return i
        seen.add(i)

    return -1

from collections import Counter

def firstDuplicateCounter(a):
    c = Counter()
    for i in a:
        c[i] += 1
        if c[i] > 1:
            return i

    return -1
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Ok, thanks a lot, I 'm going to look at the wiki on python.org for sets as I never seen them yet. I'm sorry I didn't post it on the Code Review and I'll make sure I do so in the future. – Pierre Trott Oct 01 '17 at 14:37
3

You could rewrite it this way:

def opt_first_duplicate(arr):
    unique_nb = set()
    for nb in arr:
        if nb in unique_nb:
            return nb
        else: 
            unique_nb.add(nb)
    return -1

I compared our solutions using %%timeit magic command in a jupyter notebook with a test array generated this way:

import random
test_array = [random.randint(0, 10000000) for i in range(5000)]

Example of one of the run:

firstDuplicate : 401 ms ± 1.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

opt_first_duplicate: 600 µs ± 20.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

2 tricks that optimized your code:

  • Using a set instead of a list to look for already seen integers
  • The second test (elif) is useless since you already tested the presence of the integer

Hope it solves your problem

On sets faster than lists: What makes sets faster than lists in python?

Mederic Fourmy
  • 321
  • 1
  • 2
  • 11
0
from collections import Counter
list1 = [1,2,2,3,4,4]
dict1 = Counter(list1)
duplist = [key for (key,value) in dict1.items()if value >1]
for item in list1:
    if item in duplist:
        print ('first dup',item )
        break
Selva
  • 1
0

This is not the exact answer for the question, but this may help many developers landing here. Here list comprehension and enumerate are made use of.

mylist=[1,2,3,4,5,1,6,7,8,9,5]
length=len(mylist)

all_duplicates=[v for i,v in enumerate(mylist) if i < length-1 and v in mylist[i+1:]]
[1, 5]

first_duplicate=next((v for i,v in enumerate(mylist) if i < length-1 and v in mylist[i+1:]),None)
1

I hope the code explains itself

Mohammed Shareef C
  • 3,829
  • 25
  • 35