0
def firstMissingPositive(self, A):
    least = 1024*1024*1024
    # worst case n ops
    for i in A:
        if i < least and i > 0:
            least = i
    if least > 1:
        return 1
    else:
        # should be O(n)
        A_set = set(A)
        # worst case n ops again
        while True:
            least += 1
            if least not in A_set:
                return least

I only ask because it seemed too simple for the given problem, which most people here may have seen on Leetcode or someplace similar. Is there something I'm not aware of in Python's implementation of set or dict? Lookup should be O(1) and converting a list to a set should be O(n) from what I understand.

szhongren
  • 53
  • 3

1 Answers1

2

It's not O(1) space, it's O(n) space, since you need to build the set.

As for the time, according to the Python wiki, set containment takes O(n) worst case. So your algorithm is thus O(n^2). Note this is worst case - the average time for set containment is O(1) and so the average time complexity for your algorithm is indeed O(n).

You can get the worst-case down to O(n log n) by using an ordered set, but then the average-time will be O(n log n) as well.

Claudiu
  • 224,032
  • 165
  • 485
  • 680
  • 2
    Being O(n) doesn't rule out being O(1), and being O(n^2) doesn't rule out being O(n). Set containment is usually considered O(1), at least at LeetCode (where this is from), so it would count as O(n) time there. And I think you mean "average", not "amortized". – Stefan Pochmann Sep 19 '16 at 15:48
  • Ah that makes sense, but how is the worst case of a set containment O(n)? Isn't it just a hash function making an index into an array, which you can then check if it's been initialized? – szhongren Sep 19 '16 at 15:51
  • @szhongren it is indeed a hash function. But hash functions are not perfect and different objects can have the same hash (called hash collision and there are techniques to deal with it). Hence, worst case scenario you can have `n` hash collisions (all your keys with the same hash) and you would have to loop over them to find the proper one. This, however, will probably never happen, that is why its cost is `O(1)` on average. – Imanol Luengo Sep 19 '16 at 16:14
  • @StefanPochmann: Ah no I meant the [amortized](https://en.wikipedia.org/wiki/Amortized_analysis) set containment time is `O(1)`, but the non-amortized worst case time is `O(n)`. So you could either say this algorithm is `O(n^2)`, or amortized `O(n)` (which apparently can also be written as [`O(n)+`](http://stackoverflow.com/questions/200384/constant-amortized-time#comment35034963_249695)). But it isn't `O(n)` without any qualifications, since big-O notation is for worst-case times, not average times. – Claudiu Sep 19 '16 at 16:49
  • 1
    Set containment isn't amortized worst-case O(1). It's only average-case O(1), with the usual caveat about "average" being over a uniform distribution over possible inputs that may not reflect the inputs you actually receive. Also, big-O isn't just for worst-case times. – user2357112 Sep 19 '16 at 16:58
  • @user2357112: oh hmm, I see now why Stefan made the comment about 'average' not 'amortized'. Right, makes sense - I updated my answer. – Claudiu Sep 19 '16 at 17:17
  • Just in case someone's interested in my own reasoning why set containment isn't amortized O(1), I recently [blabbed about that at LeetCode](https://discuss.leetcode.com/topic/53193/are-hash-tables-ok-here-they-re-not-really-o-1-are-they/9). @Claudiu, your answer still doesn't really answer the question. Saying "It's O(n)" doesn't answer the question "Is it O(1)?". Because it can be both O(n) and O(1). – Stefan Pochmann Sep 19 '16 at 18:35