0

I just took a Codility demo test. The question and my answer can be seen here, but I'll paste my answer here as well. My response:

def solution(A):
    # write your code in Python 2.7

    retresult = 1; # the smallest integer we can return, if it is not in the array

    A.sort()
    for i in A:
        if i > 0:
            if i==retresult:   retresult += 1 # increment the result since the current result exists in the array
            elif i>retresult:   break # we can go out of the loop since we found a bigger number than our current positive integer result


    return retresult

My question is around time complexity, which I hope to better understand by your response. The question asks for expected worst-case time complexity is O(N).

Does my function have O(N) time complexity? Does the fact that I sort the array increase the complexity, and if so how?

Codility reports (for my answer)

Detected time complexity: 
O(N) or O(N * log(N))

So, what is the complexity for my function? And if it is O(N*log(N)), what can I do to decrease the complexity to O(N) as the problem states?

Thanks very much!

p.s. my background reading on time complexity comes from this great post.

EDIT

Following the reply below, and the answers described here for this problem, I would like to expand on this with my take on the solutions:

basicSolution has an expensive time complexity and so is not the right answer for this Codility test:

def basicSolution(A):
    # 0(N*log(N) time complexity

    retresult = 1; # the smallest integer we can return, if it is not in the array

    A.sort()
    for i in A:
        if i > 0:
            if i==retresult:   retresult += 1 #increment the result since the current result exists in the array
            elif i>retresult:   break # we can go out of the loop since we found a bigger number than our current positive integer result
        else:
            continue; # negative numbers and 0 don't need any work

    return retresult

hashSolution is my take on what is described in the above article, in the "use hashing" paragraph. As I am new to Python, please let me know if you have any improvements to this code (it does work though against my test cases), and what time complexity this has?

def hashSolution(A):
    # 0(N) time complexity, I think? but requires 0(N) extra space (requirement states to use 0(N) space

    table = {}

    for i in A:
        if i > 0:
            table[i] = True # collision/duplicate will just overwrite

    for i in range(1,100000+1): # the problem says that the array has a maximum of 100,000 integers
        if not(table.get(i)): return i

    return 1 # default

Finally, the actual 0(N) solution (O(n) time and O(1) extra space solution) I am having trouble understanding. I understand that negative/0 values are pushed at the back of the array, and then we have an array of just positive values. But I do not understand the findMissingPositive function - could anyone please describe this with Python code/comments? With an example perhaps? I've been trying to work through it in Python and just cannot figure it out :(

rishijd
  • 1,294
  • 4
  • 15
  • 32

1 Answers1

5

It does not, because you sort A.

The Python list.sort() function uses Timsort (named after Tim Peters), and has a worst-case time complexity of O(NlogN).

Rather than sort your input, you'll have to iterate over it and determine if any integers are missing by some other means. I'd use a set of a range() object:

def solution(A):
    expected = set(range(1, len(A) + 1))
    for i in A:
        expected.discard(i)
    if not expected:
        # all consecutive digits for len(A) were present, so next is missing
        return len(A) + 1
    return min(expected)

This is O(N); we create a set of len(A) (O(N) time), then we loop over A, removing elements from expected (again O(N) time, removing elements from a set is O(1)), then test for expected being empty (O(1) time), and finally get the smallest element in expected (at most O(N) time).

So we make at most 3 O(N) time steps in the above function, making it a O(N) solution.

This also fits the storage requirement; all use is a set of size N. Sets have a small overhead, but always smaller than N.

The hash solution you found is based on the same principle, except that it uses a dictionary instead of a set. Note that the dictionary values are never actually used, they are either set to True or absent. I'd rewrite that as:

def hashSolution(A):
    seen = {i for i in A if i > 0}
    if not seen:
        # there were no positive values, so 1 is the first missing.
        return 1
    for i in range(1, 10**5 + 1):
        if i not in seen:
            return i
    # we can never get here because the inputs are limited to integers up to
    # 10k. So either `seen` has a limited number of positive values below
    # 10.000 or none at all.

The above avoids looping all the way to 10.000 if there were no positive integers in A.

The difference between mine and theirs is that mine starts with the set of expected numbers, while they start with the set of positive values from A, inverting the storage and test.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thanks very much for your reply, I now understand the time complexity part for list.sort(). For the actual solution though, I'm not sure if this is for the original problem. Forgive me if it is, but could you take a look at the EDIT above in my post, and let me know your thoughts? Specifically, your reply on the hashSolution and your take on the findMissingPositive function would be greatly appreciated. Thanks! – rishijd Nov 16 '17 at 01:15
  • @rishijd: I tested my solution with all the sample inputs, and it gives the correct solution for all of those. It is a valid solution. – Martijn Pieters Nov 16 '17 at 07:53
  • @rishijd: the hash solution **is the same solution as I gave**, only using a dictionary with `True` to simulate a set. They inverted storage and testing, that's all. – Martijn Pieters Nov 16 '17 at 07:54
  • Thanks so much Martijn, brilliant! I had to play with the function myself to realise why "expected" works :) One last question - I'm still trying to understand the space complexity of solution() vs hashSolution(). I understand for hashSolution, Time: O(N). The creation of the "seen" set = 0(N), then we check if not seen = O(1), and then we go through a large range O(100000+1) - which is O(N). Is that correct? Space: requires O(N) extra space? Does hashSolution and solution need the same space? Just would like to know if my understanding of space complexity is correct via these examples. – rishijd Nov 16 '17 at 19:04