-1

I am practicing some questions from the Codility. However every time I run the questions I am getting a very low score(25%) for the performance(runtime). Can you help me know how to improve my codes so as to score a better score?

The question is:

Write a function:

def solution(A)

that, given an array A consisting of N integers fulfilling the above conditions, returns the value of the unpaired element.

For example, given array A such that:

  A[0] = 9  A[1] = 3  A[2] = 9
  A[3] = 3  A[4] = 9  A[5] = 7
  A[6] = 9

the function should return 7, as explained in the example above.

And My code for the same is:

def solution(A):
# write your code in Python 3.6
    lis=[i for i in A if A.count(i) ==1]
    return lis[0]

Output:

medium2 "medium random test n=100,003 ✘TIMEOUT ERROR Killed. Hard limit reached: 6.000 sec"

Tsyvarev
  • 60,011
  • 17
  • 110
  • 153
data being
  • 41
  • 1
  • 10
  • 3
    This may be a good question for the [codereview.se] stack site – G. Anderson Jun 20 '19 at 19:39
  • @G.Anderson Thank you. I wasn't knowing that. – data being Jun 20 '19 at 20:04
  • @AndrejKesely Thanks tried that approach as well. But codility gave the same score for both the approaches. Thank you for the helping hand. – data being Jun 20 '19 at 20:06
  • 2
    Optimization-related questions are suited for Stack Overflow, no needs to move them to Code Review. The problem, described in the question, is relatively small and focused, so I am not sure why "too broad" close reason is applied. Also, according to the answer, the problem can be solved by knowing the proper python functions, which is the very purpose of Stack Overflow. Voted for reopen. – Tsyvarev Jul 15 '19 at 09:55

3 Answers3

4

It's because list.count will search through the entire list every time, which is O(N) * N, or N**2. You can use collections.Counter to count how many times an item occurs once, or in one pass, and lookups are O(1) because it's a dictionary:

from collections import Counter

def solution(A):
    c = Counter(A)
    # this will iterate over all the key/value pairs
    # which is at worst N elements long
    return [k for k, v in c.items() if v==1]

To show increase in speed:

python -m timeit -s "from random import randint; A = [randint(0,500) for i in range(10000)]" "x = [a for a in A if A.count(a)==1]"
10 loops, best of 3: 957 msec per loop


python -m timeit -s "from random import randint; from collections import Counter; A = [randint(0,500) for i in range(10000)]; c = Counter(A)" "x = [s for s, v in c.items() if v==1]"
10000 loops, best of 3: 20.1 usec per loop

Even though I'm creating a random list every time, the average best run for the Counter implementation across 20 trials is 20.2us, whereas the list.count implementation is 962.1ms. So even though each run of timeit is not exactly apples to apples, I think the average shows for itself

C.Nivs
  • 12,353
  • 2
  • 19
  • 44
  • Thank you it works. Earlier they didn't allow me to import numpy, so I thought importing additional LIB was not allowed. Anyways this works. Thank you very much :) – data being Jun 20 '19 at 19:46
  • @AkhilT Counter is just a dict specialized for counting. You can implement the same algorithm using a plain dict – juanpa.arrivillaga Jun 20 '19 at 19:51
  • @juanpa.arrivillaga Thanks for the information. It was helpful. – data being Jun 20 '19 at 19:57
  • 1
    Minor optimization: Avoid nested lookup and `dict` lookup overhead by changing `[k for k in c if c[k]==1]` to `[k for k, v in c.items() if v==1]`. Probably slightly slower for small inputs, but should scale better for larger inputs. Also, don't put the import in the function itself unless the exercise requires it (it's surprisingly expensive, at least on CPython, to re-`import` a module, even when the module is already loaded and cached, thanks to the incredibly complex module lookup machinery). – ShadowRanger Jun 20 '19 at 20:00
  • @ShadowRanger true, I was putting this together relatively quickly, I'll add that as an edit – C.Nivs Jun 20 '19 at 20:12
2

Version with itertools.groupby() is roughly 3x more performant than version with collections.Counter:

import collections
from itertools import groupby
import timeit

l = [9, 3, 9, 3, 9, 7, 9]

def fn1(lst):
    return [v for v, g in groupby(sorted(lst)) if len([*g]) == 1]

def fn2(lst):
    k = collections.Counter(lst)
    return [i for i in k if k[i] == 1]

print(timeit.timeit(lambda: fn1(l), number=100_000, globals=globals()) )
print(timeit.timeit(lambda: fn2(l), number=100_000, globals=globals()) )

Prints:

0.11646193999331445
0.33489679799822625
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • Little bit slower but `sum(1 for _ in g)` wouldn't read the entire list into memory, instead of `len([*g])`. – Error - Syntactical Remorse Jun 20 '19 at 20:04
  • I suspect the performance improvement here is a product of the small input; for three unique items, `sorted`'s `O(n log n)` complexity hardly matters, and the inefficiencies in `groupby`'s implementation don't matter much, while `Counter`'s roughly `O(n)` counting complexity has no chance to shine. In my experience, `groupby` is a lot slower than one would assume (sometimes losing to `Counter` even when your input is already sorted, so no call to `sorted` is necessary). – ShadowRanger Jun 20 '19 at 20:04
  • @Error-SyntacticalRemorse: Or go wildly overboard and use [the fastest (asymptotically) fixed memory approach to counting an iterator/generator](https://stackoverflow.com/a/34404546/364696). :-) – ShadowRanger Jun 20 '19 at 20:05
  • @ShadowRanger Let'er rip... Though that solution appears to be the slowest (at least from testing it on repl.it) – Error - Syntactical Remorse Jun 20 '19 at 20:06
  • 1
    @ShadowRanger On my machine (AMD 2400G) the version with `groupby()` is faster approx till the length of 85-90 of input array. – Andrej Kesely Jun 20 '19 at 20:12
0

Try the following :

import collections 

k = collections.Counter(A)
return [ i for i in k if k[i] == 1]