91

An interesting interview question that a colleague of mine uses:

Suppose that you are given a very long, unsorted list of unsigned 64-bit integers. How would you find the smallest non-negative integer that does not occur in the list?

FOLLOW-UP: Now that the obvious solution by sorting has been proposed, can you do it faster than O(n log n)?

FOLLOW-UP: Your algorithm has to run on a computer with, say, 1GB of memory

CLARIFICATION: The list is in RAM, though it might consume a large amount of it. You are given the size of the list, say N, in advance.

eeerahul
  • 1,629
  • 4
  • 27
  • 38
PeterAllenWebb
  • 10,319
  • 3
  • 37
  • 44
  • 6
    I think you can leave out the non-negative part, seeing how you're talking about an unsigned integer. – KevenDenen Oct 19 '09 at 03:47
  • 4
    The question is pretty basic, unless I am waaaay off-base, IMO, but, as others have mentioned, there are questions to ask, or assumptions that should be stated. – James Black Oct 19 '09 at 03:50
  • That's what makes it a good interview question. Can you do it in less than O(n log n) steps? – PeterAllenWebb Oct 19 '09 at 03:52
  • With two and a half exabytes of memory and a processor capable of addressing it, you can do it in O(n) - see my answer. You'll usually find you can trade off execution time for storage, hardware permitting :-) – paxdiablo Oct 19 '09 at 04:04
  • Okay. So now I'd ask the candidate to give me a few options for trade-offs. What would you suggest? – PeterAllenWebb Oct 19 '09 at 04:09
  • 8
    @paxdiablo: This is a case where saying O(n) doesn't mean that much. Even if you store your 2^64 bit array on clay tablets on Easter Island and access it by carrier pigeon, the algorithm is still O(n). – I. J. Kennedy Oct 19 '09 at 04:14
  • 6
    Changing the memory requirements halfway through makes this a great interview question ;-) – Chris Ballance Oct 19 '09 at 04:14
  • Is the list of numbers stored in RAM, too? – Barry Brown Oct 19 '09 at 04:15
  • @PeterAllenWebb - can you define 'Not in the list'. If I have {2, 3, 5} is the answer 1 or 4? – James Black Oct 19 '09 at 04:16
  • @James, that's answered in the comments to my answer, it should be 0. – paxdiablo Oct 19 '09 at 04:18
  • Do you know the answer? I dont think you can better than nlogn. – DarthVader Oct 19 '09 at 04:19
  • @IJ, The OP mentioned big-O and it *does* matter simply because if you imposed the same conditions (pidgeons, clay tablets) on the sort solution O(n) would still be better. You're saying it doesn't matter because an O(n^2) algorithm in memory would outperform a O(n) with clay tablets. But that's not a fair comparison. I could just as easily propose an O(n^3) solution using as-yet-undiscovered *really* fast equipment which could outperform O(1). – paxdiablo Oct 19 '09 at 04:21
  • 1
    I think it's amusing that all the answers do the same general solution (sort the array and find the first value that breaks the sequence), but they all use a different sort. (Modified quicksort, radix sort, ...) The accepted answer is equivalent to a counting sort that discards elements above N. – Joren Oct 19 '09 at 07:17
  • you cant use counting sort for large sets, it defeats the whole purpose of it, and will become n^2 . – DarthVader Oct 19 '09 at 19:13
  • @paxdiablo - Indeed, if we are allowed to use a quantum computing algorithm, there could be an `O(1)` solution. Modulo that talking about big-O and quantum computing in the same breath could be nonsensical ... :-). Note that quantum computing is not entirely hypothetical. – Stephen C Feb 15 '14 at 03:39

28 Answers28

126

If the datastructure can be mutated in place and supports random access then you can do it in O(N) time and O(1) additional space. Just go through the array sequentially and for every index write the value at the index to the index specified by value, recursively placing any value at that location to its place and throwing away values > N. Then go again through the array looking for the spot where value doesn't match the index - that's the smallest value not in the array. This results in at most 3N comparisons and only uses a few values worth of temporary space.

# Pass 1, move every value to the position of its value
for cursor in range(N):
    target = array[cursor]
    while target < N and target != array[target]:
        new_target = array[target]
        array[target] = target
        target = new_target

# Pass 2, find first location where the index doesn't match the value
for cursor in range(N):
    if array[cursor] != cursor:
        return cursor
return N
Ants Aasma
  • 53,288
  • 15
  • 90
  • 97
  • 9
    Small nitpick. You've missed a trivial case: when the list is {0, ..., N-1}. In that case pass 1 does nothing and in pass 2 array[cursor] == cursor for all the entries in the list, so the algorithm does not return. So you need a 'return N' statement at the end. – Alex Oct 23 '09 at 15:24
  • 12
    Your solution conflates the domain and the range (target is both a value and an index). The range is limited by the available storage to 128M elements, yet the domain is 2G in size. It will fail with a single entry with a value greater than the number of entries that can be allocated into the array. If the question did not specify 'very long', the answer is elegant, even if it destroys the input. The time-space trade off is very apparent in this problem, and a O(N) solution may not be possible under the constraints provided. – Pekka Jan 30 '13 at 00:40
  • 2
    The second pass could use binary search instead of linear search. – user448810 Jul 16 '13 at 02:00
  • 2
    @AntsAasma could you explain why it is at most 3N comparisons? The pass 2 is N comparisons, so why the pass 1 has at most 2N comparisons? – mitchelllc Jan 06 '14 at 04:26
  • 4
    This solution works only if the range of values and index are comparable. – Dubby Jun 23 '14 at 18:58
  • Any intuition on why while loop always terminates? – sinoTrinity Feb 20 '15 at 14:32
  • The loop condition ensures that a[i] != i and the loop body assigns a[i] = i, after each iteration of the loop there is one less position in the array where a[i] != i, this guarantees that the loop body executes at most N times. – Ants Aasma Mar 05 '15 at 03:26
  • This solution will not work if the values are not in range of indexes. Consider {234, 343, 123, 543} – Arry Jun 29 '15 at 11:52
  • 7
    It will work fine with larger values. The larger values can be ignored because they can't have anything to do with the smallest value not in the array. For your example the first pass will loop over the array ignoring all values due to target < N and will then return 0 on the first iteration of the second pass. – Ants Aasma Jul 14 '15 at 10:08
  • Any ideas if it could be done without random access, say on doubly linked list? – Somnium Oct 23 '17 at 05:21
  • How about using `next`? – Zeinab Abbasimazar Nov 28 '17 at 10:27
91

Here's a simple O(N) solution that uses O(N) space. I'm assuming that we are restricting the input list to non-negative numbers and that we want to find the first non-negative number that is not in the list.

  1. Find the length of the list; lets say it is N.
  2. Allocate an array of N booleans, initialized to all false.
  3. For each number X in the list, if X is less than N, set the X'th element of the array to true.
  4. Scan the array starting from index 0, looking for the first element that is false. If you find the first false at index I, then I is the answer. Otherwise (i.e. when all elements are true) the answer is N.

In practice, the "array of N booleans" would probably be encoded as a "bitmap" or "bitset" represented as a byte or int array. This typically uses less space (depending on the programming language) and allows the scan for the first false to be done more quickly.


This is how / why the algorithm works.

Suppose that the N numbers in the list are not distinct, or that one or more of them is greater than N. This means that there must be at least one number in the range 0 .. N - 1 that is not in the list. So the problem of find the smallest missing number must therefore reduce to the problem of finding the smallest missing number less than N. This means that we don't need to keep track of numbers that are greater or equal to N ... because they won't be the answer.

The alternative to the previous paragraph is that the list is a permutation of the numbers from 0 .. N - 1. In this case, step 3 sets all elements of the array to true, and step 4 tells us that the first "missing" number is N.


The computational complexity of the algorithm is O(N) with a relatively small constant of proportionality. It makes two linear passes through the list, or just one pass if the list length is known to start with. There is no need to represent the hold the entire list in memory, so the algorithm's asymptotic memory usage is just what is needed to represent the array of booleans; i.e. O(N) bits.

(By contrast, algorithms that rely on in-memory sorting or partitioning assume that you can represent the entire list in memory. In the form the question was asked, this would require O(N) 64-bit words.)


@Jorn comments that steps 1 through 3 are a variation on counting sort. In a sense he is right, but the differences are significant:

  • A counting sort requires an array of (at least) Xmax - Xmin counters where Xmax is the largest number in the list and Xmin is the smallest number in the list. Each counter has to be able to represent N states; i.e. assuming a binary representation it has to have an integer type (at least) ceiling(log2(N)) bits.
  • To determine the array size, a counting sort needs to make an initial pass through the list to determine Xmax and Xmin.
  • The minimum worst-case space requirement is therefore ceiling(log2(N)) * (Xmax - Xmin) bits.

By contrast, the algorithm presented above simply requires N bits in the worst and best cases.

However, this analysis leads to the intuition that if the algorithm made an initial pass through the list looking for a zero (and counting the list elements if required), it would give a quicker answer using no space at all if it found the zero. It is definitely worth doing this if there is a high probability of finding at least one zero in the list. And this extra pass doesn't change the overall complexity.


EDIT: I've changed the description of the algorithm to use "array of booleans" since people apparently found my original description using bits and bitmaps to be confusing.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • This is the best answer I know of. I think you could probably cut down the memory consumption, but it would ideal in terms of speed. – PeterAllenWebb Oct 19 '09 at 04:37
  • 1
    You beat me to it. This is what I was going to suggest. Here's why it works: By the pigeonhole principle, any gap must be less than N. So, we can discard any values greater than N. For the values less than N, we have a flag for each value. Scan the list and set the flags, then scan the flags to find a hole. – divegeek Oct 19 '09 at 04:39
  • Stephen, please provide the "Why this works" bit. SO is supposed to be a place for answers, not puzzles. – paxdiablo Oct 19 '09 at 04:41
  • i don't understand.. if N is a really large number, step 3 might give me a bitmap with all bits set to 1.. because the list is unsorted, I don't understand how step 4 gives me the right answer – Aditya Mukherji Oct 19 '09 at 04:48
  • 3
    @adi92 If step 3 gives you a bitmap with all bits set to 1, then the list contains every value from 0 to N-1. That means the smallest non-negative integer in the list is N. If there's any value between 0 and N-1 that is NOT in the list, then the corresponding bit will not be set. The smallest such value is therefore the answer. – divegeek Oct 19 '09 at 04:51
  • wait, where in the question specification does it mention that there cannot be duplicates? doesnt this logic work only if there are N distinct numbers? – Aditya Mukherji Oct 19 '09 at 05:00
  • i mean if the list is [1,2,3,1,2,3,1,2,3 .. 100 times], the answer is not 100 – Aditya Mukherji Oct 19 '09 at 05:02
  • 4
    @adi92 In your example, the list would contain 300 elements. That means that if there is any "missing" value, it must be less than 300. Running the algorithm, we'd create a bitfield with 300 slots, then repeatedly set the bits in slots 1, 2, and 3, leaving all of the other slots -- 0 and 4 through 299 -- clear. When scanning the bitfield we'd find the flag in slot 0 clear, so we'd know 0 is the answer. – divegeek Oct 19 '09 at 05:23
  • 4
    Note that this algorithm might be more simply understood without the bit twiddling: "Create a Boolean array of size N" etc. Once you understand it that way, moving to a bitwise version is conceptually easy. – Jon Skeet Oct 19 '09 at 05:27
  • Turning my -1 into a +1 now that you've explained it. Pretty clever, it's actually the same as my bitmap solution but without the 2.5 exabyte memory requirement :-) - I had to make a minor edit to allow the vote reversal. – paxdiablo Oct 19 '09 at 05:39
  • 1
    Conceptually boolean array and bitmap are the same thing. – starblue Oct 19 '09 at 06:38
  • 2
    When giving an abstract solution, use the conceptually simplest way that works, and don't overly specialize. Your solution screams for the use of an (abstract) boolean array, so call it that. That you might implement this array by `bool[]` or by a bitmap is irrelevant to the general solution. – Joren Oct 19 '09 at 07:11
  • 2
    I think this solution might be best described by "Use a counting sort that disregards elements above N, then find the first missing element by doing a linear search from the start." – Joren Oct 19 '09 at 07:20
  • To me, bitmap == 2d image. so this was a bit confusing... but it makes total sense now that I understand that is a 1d bool array :p – Svish Oct 19 '09 at 12:03
  • Your comments on Counting Sort are correct and valid, but I didn't say 'that disregards elements above N' for nothing. :) Xmax becomes N, and with Xmin set to 0 (which I forgot isn't standard counting sort) the issues are solved. Anyway, I like your elaboration and your answer had always been the best. – Joren Oct 20 '09 at 08:09
  • I implemented this and there is a very small edge case that this fails... consider [1,2], anyone have a idea how to deal with this case? implementing some logical check breaks other cases – Apprentice Programmer Jul 31 '17 at 16:28
  • I think the problem is that you have a bug in your implementation of the pseudo code above. But we can't help you unless you show us the code. Ask a new question, providing an MCVE. – Stephen C Jul 31 '17 at 23:23
14

Since the OP has now specified that the original list is held in RAM and that the computer has only, say, 1GB of memory, I'm going to go out on a limb and predict that the answer is zero.

1GB of RAM means the list can have at most 134,217,728 numbers in it. But there are 264 = 18,446,744,073,709,551,616 possible numbers. So the probability that zero is in the list is 1 in 137,438,953,472.

In contrast, my odds of being struck by lightning this year are 1 in 700,000. And my odds of getting hit by a meteorite are about 1 in 10 trillion. So I'm about ten times more likely to be written up in a scientific journal due to my untimely death by a celestial object than the answer not being zero.

Barry Brown
  • 20,233
  • 15
  • 69
  • 105
  • 11
    Your calculation only holds if the values are uniformly distributed and selected at random. They could just as well have been generated sequentially. – divegeek Oct 19 '09 at 04:35
  • 1
    You're correct, of course. But I'm all about optimizing for the common case. :) – Barry Brown Oct 19 '09 at 04:37
  • 10
    So, what are the odds of interviewee getting selected with this answer? – Amarghosh Oct 19 '09 at 08:36
  • 6
    The question does not say the numbers are selected uniformly at random. They are selected by the person setting this question. Given this, the probability of 0 being in the list is *much* larger than 1 in 137,438,953,472, probably even larger than 1 in 2. :-) – ShreevatsaR Oct 19 '09 at 12:52
  • 8
    @Amarghosh The answer to that question is also zero. – PeterAllenWebb Oct 19 '09 at 19:48
  • You don't know if it's the common case. You just optimized for one of the possible cases, and not the case the problem was asking about. – djechlin Nov 01 '16 at 23:34
  • I don't quite get it... the question says a very long array... it can be of a size of 10,000, so any non-negative integer missing is possible... – nonopolarity Sep 04 '19 at 11:26
10

As pointed out in other answers you can do a sort, and then simply scan up until you find a gap.

You can improve the algorithmic complexity to O(N) and keep O(N) space by using a modified QuickSort where you eliminate partitions which are not potential candidates for containing the gap.

  • On the first partition phase, remove duplicates.
  • Once the partitioning is complete look at the number of items in the lower partition
  • Is this value equal to the value used for creating the partition?
    • If so then it implies that the gap is in the higher partition.
      • Continue with the quicksort, ignoring the lower partition
    • Otherwise the gap is in the lower partition
      • Continue with the quicksort, ignoring the higher partition

This saves a large number of computations.

cdiggins
  • 17,602
  • 7
  • 105
  • 102
  • That's pretty nifty. It would assume you can compute the length of the partition in less than linear time, which can be done if that's stored along with the partition array. It also assumes the original list is held in RAM. – Barry Brown Oct 19 '09 at 04:20
  • 2
    If you know the length of the list, you can also cull any values greater than len(list). By the pigeonhole principle, any 'holes' must be less than len(list). – divegeek Oct 19 '09 at 04:33
  • 1
    I don't think that's O(n)... For one, I'm not sure you can remove duplicates until a list is fully sorted. Secondly, while you can guarantee the throwing away of half the search space each iteration (because you've divided into under and over the midpoint), you *still* have multiple passes (dependent on n) over data that's dependent on n. – paxdiablo Oct 19 '09 at 04:34
  • 1
    paxdiablo: You can build a new list with only unique values by using a bitmap method like what Stephen C proposed. This runs in O(n) time and space. I'm not sure if it can be done better than that. – Nic Oct 19 '09 at 08:46
8

Since the numbers are all 64 bits long, we can use radix sort on them, which is O(n). Sort 'em, then scan 'em until you find what you're looking for.

if the smallest number is zero, scan forward until you find a gap. If the smallest number is not zero, the answer is zero.

Barry Brown
  • 20,233
  • 15
  • 69
  • 105
8

To illustrate one of the pitfalls of O(N) thinking, here is an O(N) algorithm that uses O(1) space.

for i in [0..2^64):
  if i not in list: return i

print "no 64-bit integers are missing"
I. J. Kennedy
  • 24,725
  • 16
  • 62
  • 87
  • Breaking out the the loop early makes the runtime O(n^2). – Will Harris Oct 19 '09 at 06:07
  • 1
    Will is right. This is not O(n) because you actually have two loops here, but one is implicit. Determining whether a value is in a list is an O(n) operation, and you're doing that n times in your for loop. That makes it O(n^2). – Nic Oct 19 '09 at 08:32
  • 7
    Nic, Will, it's O(n * N) where n is the size of the list and N is the size of the domain (64bit integers). While N is a huge number, it is still a constant so formally the complexity for the problem as stated is O(n). – Ants Aasma Oct 19 '09 at 10:50
  • 1
    Ants, I agree that it's O(n*N), but N is not constant. Because the algorithm finishes when it finds the answer, the number of complete iterations through the outer loop is equal to the answer, which itself is bound by the size of the list. So, O(N*n) is O(n^2) in this case. – Will Harris Oct 19 '09 at 12:52
  • 13
    Looking for a number in a list of N elements is clearly O(N). We do this 2^64 times. While large, 2^64 is a CONSTANT. Therefore the algorithm is C*O(N), which is still O(N). – I. J. Kennedy Oct 19 '09 at 14:06
  • 3
    I must recant my previous statement; by the strictest definition, this operation is indeed O(n). – Nic Oct 19 '09 at 16:15
  • While it's a little amusing to speculate on ways of improving this algorithm, a bloom filter would certainly help to reduce the overhead of repeatedly looking for a number in a list of N elements. – Zach Conn Jan 08 '14 at 15:57
  • 1
    You suppose that N is greater than n, so 2Nk <= 2nk for every k in 0..n, and so the number of operations is O(n^2) per definition of O. You're playing with fire, supposing that there's a higher bound of possible integers conflicts with the genericity of the O notation. – floribon Feb 04 '17 at 23:48
  • O(n^2), you claim that by picking an N=2^64 and claiming that N>n, but then your algorithm only works for n<=N; refute by choosing n>N (your claimed constant). – ChuckCottrill Apr 13 '18 at 02:08
  • How about converting the list to a set and there by making the membership check O(1) ? Hence the solution becomes O(n) time and space. – Prithiviraj Damodaran Oct 24 '20 at 16:55
  • The argument that this algorithm is O(n) is the same as the argument that all algorithms that run on actual, physical digital computers run in O(1) time because all actual computers have finite resources so ultimately all variables are smaller than some constant and thus can be regarded as constants i.e. it is a spurious argument. In the above *program* 2^64 is indeed a constant, but running time of the abstract algorithm it reifies depends on the maximum value, k, that may be in the list, not just the size of the list, thus we must say the algorithm's running time is O(k*n) – jwezorek Aug 25 '23 at 15:41
5

For a space efficient method and all values are distinct you can do it in space O( k ) and time O( k*log(N)*N ). It's space efficient and there's no data moving and all operations are elementary (adding subtracting).

  1. set U = N; L=0
  2. First partition the number space in k regions. Like this:
    • 0->(1/k)*(U-L) + L, 0->(2/k)*(U-L) + L, 0->(3/k)*(U-L) + L ... 0->(U-L) + L
  3. Find how many numbers (count{i}) are in each region. (N*k steps)
  4. Find the first region (h) that isn't full. That means count{h} < upper_limit{h}. (k steps)
  5. if h - count{h-1} = 1 you've got your answer
  6. set U = count{h}; L = count{h-1}
  7. goto 2

this can be improved using hashing (thanks for Nic this idea).

  1. same
  2. First partition the number space in k regions. Like this:
    • L + (i/k)->L + (i+1/k)*(U-L)
  3. inc count{j} using j = (number - L)/k (if L < number < U)
  4. find first region (h) that doesn't have k elements in it
  5. if count{h} = 1 h is your answer
  6. set U = maximum value in region h L = minimum value in region h

This will run in O(log(N)*N).

Michael Lihs
  • 7,460
  • 17
  • 52
  • 85
Egon
  • 1,705
  • 18
  • 32
  • I really like this answer. It was a bit hard to read, but it's very similar to what I had in my head when I read the question. – Nic Oct 19 '09 at 08:27
  • also at some point it would be smart to switch to that bitmap solution by Stephen C. probably when `U-L < k` – Egon Oct 19 '09 at 12:31
  • This does not run in O(log(N)*N) but in O(N). Your answer is a generalization of @cdiggins answer and it runs in O(N) because sum(1/k**i for i in range(ceil(log_k(n)))) <= 2. – Lapinot Jan 24 '16 at 12:28
  • On each iteration you go through O(N) numbers, it takes O(log_k(N)) total iterations. Hence O(log_k(N)*N) == O(log(N)*N). The original numbers aren't sorted/bucketed and you need to go through all of them. – Egon Jan 24 '16 at 18:49
  • But if you partitioned the original list in k regions (of size n/k) then you select the first region that isn't full. Hence in the next iteration you only need to consider the selected region and divide it in k new regions (of size n/k**2) etc. Actually you dont iterate on the whole list every time (else what is the point of partitioning?). – Lapinot Jan 24 '16 at 21:35
  • You partition the number space not the numbers themselves. There is no way this can work in O(k) memory and without modifying original numbers, otherwise. – Egon Jan 25 '16 at 05:45
  • Hmm I understand now. Actually so i misunderstood your algorithm (but it also works if you partition the numbers: you get O(k*n) time and O(n/k) space). So your algorithm is like the in-place sorting method but without disrupting anything, interesting :) – Lapinot Jan 25 '16 at 18:22
3

I'd just sort them then run through the sequence until I find a gap (including the gap at the start between zero and the first number).

In terms of an algorithm, something like this would do it:

def smallest_not_in_list(list):
    sort(list)
    if list[0] != 0:
        return 0
    for i = 1 to list.last:
        if list[i] != list[i-1] + 1:
            return list[i-1] + 1
    if list[list.last] == 2^64 - 1:
        assert ("No gaps")
    return list[list.last] + 1

Of course, if you have a lot more memory than CPU grunt, you could create a bitmask of all possible 64-bit values and just set the bits for every number in the list. Then look for the first 0-bit in that bitmask. That turns it into an O(n) operation in terms of time but pretty damned expensive in terms of memory requirements :-)

I doubt you could improve on O(n) since I can't see a way of doing it that doesn't involve looking at each number at least once.

The algorithm for that one would be along the lines of:

def smallest_not_in_list(list):
    bitmask = mask_make(2^64) // might take a while :-)
    mask_clear_all (bitmask)
    for i = 1 to list.last:
        mask_set (bitmask, list[i])
    for i = 0 to 2^64 - 1:
        if mask_is_clear (bitmask, i):
            return i
    assert ("No gaps")
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • From the description it seems to preclude 0 to the first element, as it is the smallest not in the list. But, that is an assumption I made, I could be wrong. – James Black Oct 19 '09 at 03:49
  • My thoughts were that if the sorted sequence was 4,5,6, then 0 would be the smallest not in the list. – paxdiablo Oct 19 '09 at 03:51
  • I expect that 2, 3, 5, the answer should be 4, but, I could be wrong. – James Black Oct 19 '09 at 03:54
  • A question that should be answered by the OP. Is the search space "all 64-bit unsigned integers" or "all numbers between the lowest and highest in the list"? – paxdiablo Oct 19 '09 at 03:57
  • I agree that in the worst case you have to look at least once, unless it was already sorted in a binary tree perhaps. – James Black Oct 19 '09 at 03:57
  • @paxdiablo The question should be taken completely literally. The smallest integer that does not occur in the list. Theoretically the list could contain all 2^64 integers, but let's just say that will never happen since it would require an imposing amount of memory. – PeterAllenWebb Oct 19 '09 at 04:01
  • @JamesBlack The answer for your example of [2,3,5] should be 0. – PeterAllenWebb Oct 19 '09 at 04:06
  • Then you just sort them. Then if n[0] != 0, the answer is 0. Otherwise the answer is n[i]+1 for the first i where n[i+1] != n[i]+1. – paxdiablo Oct 19 '09 at 04:06
2

Sort the list, look at the first and second elements, and start going up until there is a gap.

James Black
  • 41,583
  • 10
  • 86
  • 166
2

We could use a hash table to hold the numbers. Once all numbers are done, run a counter from 0 till we find the lowest. A reasonably good hash will hash and store in constant time, and retrieves in constant time.

for every i in X         // One scan Θ(1)
   hashtable.put(i, i);  // O(1)

low = 0;

while (hashtable.get(i) <> null)   // at most n+1 times
   low++;

print low;

The worst case if there are n elements in the array, and are {0, 1, ... n-1}, in which case, the answer will be obtained at n, still keeping it O(n).

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
Milind C
  • 21
  • 3
1

You can do it in O(n) time and O(1) additional space, although the hidden factor is quite large. This isn't a practical way to solve the problem, but it might be interesting nonetheless.

For every unsigned 64-bit integer (in ascending order) iterate over the list until you find the target integer or you reach the end of the list. If you reach the end of the list, the target integer is the smallest integer not in the list. If you reach the end of the 64-bit integers, every 64-bit integer is in the list.

Here it is as a Python function:

def smallest_missing_uint64(source_list):
    the_answer = None

    target = 0L
    while target < 2L**64:

        target_found = False
        for item in source_list:
            if item == target:
                target_found = True

        if not target_found and the_answer is None:
            the_answer = target

        target += 1L

    return the_answer

This function is deliberately inefficient to keep it O(n). Note especially that the function keeps checking target integers even after the answer has been found. If the function returned as soon as the answer was found, the number of times the outer loop ran would be bound by the size of the answer, which is bound by n. That change would make the run time O(n^2), even though it would be a lot faster.

Will Harris
  • 21,597
  • 12
  • 64
  • 64
  • True. It amusing how horribly some of the algorithms that are O(1) space and O(n) time fail in practice with this question. – PeterAllenWebb Oct 19 '09 at 13:38
1

Thanks to egon, swilden, and Stephen C for my inspiration. First, we know the bounds of the goal value because it cannot be greater than the size of the list. Also, a 1GB list could contain at most 134217728 (128 * 2^20) 64-bit integers.

Hashing part
I propose using hashing to dramatically reduce our search space. First, square root the size of the list. For a 1GB list, that's N=11,586. Set up an integer array of size N. Iterate through the list, and take the square root* of each number you find as your hash. In your hash table, increment the counter for that hash. Next, iterate through your hash table. The first bucket you find that is not equal to it's max size defines your new search space.

Bitmap part
Now set up a regular bit map equal to the size of your new search space, and again iterate through the source list, filling out the bitmap as you find each number in your search space. When you're done, the first unset bit in your bitmap will give you your answer.

This will be completed in O(n) time and O(sqrt(n)) space.

(*You could use use something like bit shifting to do this a lot more efficiently, and just vary the number and size of buckets accordingly.)

Nic
  • 4,319
  • 5
  • 29
  • 36
  • 1
    I like the idea of dividing the search space into Root-N buckets to reduce memory footprint, but duplicates in the list would break this method. I do wonder if it can be fixed. – PeterAllenWebb Oct 19 '09 at 14:29
  • You're right, I neglected to consider duplicate entries. I'm not sure that can be worked around. – Nic Oct 19 '09 at 15:55
1

Well if there is only one missing number in a list of numbers, the easiest way to find the missing number is to sum the series and subtract each value in the list. The final value is the missing number.

Jeff Lundstrom
  • 1,901
  • 2
  • 11
  • 5
1
 int i = 0;
            while ( i < Array.Length)
            {

                if (Array[i] == i + 1)
                {
                    i++;
                }

                if (i < Array.Length)
                {
                    if (Array[i] <= Array.Length)
                    {//SWap

                        int temp = Array[i];
                        int AnoTemp = Array[temp - 1];
                        Array[temp - 1] = temp;
                        Array[i] = AnoTemp;

                    }
                    else
                       i++;



                }
            }

            for (int j = 0; j < Array.Length; j++)
            {
                if (Array[j] > Array.Length)
                {
                    Console.WriteLine(j + 1);
                    j = Array.Length;
                }
                else
                    if (j == Array.Length - 1)
                        Console.WriteLine("Not Found !!");

            }
        }
ranjeet
  • 540
  • 7
  • 16
1

Here's my answer written in Java:

Basic Idea: 1- Loop through the array throwing away duplicate positive, zeros, and negative numbers while summing up the rest, getting the maximum positive number as well, and keep the unique positive numbers in a Map.

2- Compute the sum as max * (max+1)/2.

3- Find the difference between the sums calculated at steps 1 & 2

4- Loop again from 1 to the minimum of [sums difference, max] and return the first number that is not in the map populated in step 1.

public static int solution(int[] A) {
    if (A == null || A.length == 0) {
        throw new IllegalArgumentException();
    }

    int sum = 0;
    Map<Integer, Boolean> uniqueNumbers = new HashMap<Integer, Boolean>();
    int max = A[0];
    for (int i = 0; i < A.length; i++) {
        if(A[i] < 0) {
            continue;
        }
        if(uniqueNumbers.get(A[i]) != null) {
            continue;
        }
        if (A[i] > max) {
            max = A[i];
        }
        uniqueNumbers.put(A[i], true);
        sum += A[i];
    }
    int completeSum = (max * (max + 1)) /  2;
    for(int j = 1; j <= Math.min((completeSum - sum), max); j++) {
        if(uniqueNumbers.get(j) == null) { //O(1)
            return j;
        }
    }
    //All negative case
    if(uniqueNumbers.isEmpty()) {
        return 1;
    }
    return 0;
}
Rami
  • 7,162
  • 1
  • 22
  • 19
0

The Dafny fragment from Ants' answer shows why the in-place algorithm may fail. The requires pre-condition describes that the values of each item must not go beyond the bounds of the array.

method AntsAasma(A: array<int>) returns (M: int)
  requires A != null && forall N :: 0 <= N < A.Length ==> 0 <= A[N] < A.Length;
  modifies A; 
{
  // Pass 1, move every value to the position of its value
  var N := A.Length;
  var cursor := 0;
  while (cursor < N)
  {
    var target := A[cursor];
    while (0 <= target < N && target != A[target])
    {
        var new_target := A[target];
        A[target] := target;
        target := new_target;
    }
    cursor := cursor + 1;
  }

  // Pass 2, find first location where the index doesn't match the value
  cursor := 0;
  while (cursor < N)
  {
    if (A[cursor] != cursor)
    {
      return cursor;
    }
    cursor := cursor + 1;
  }
  return N;
}

Paste the code into the validator with and without the forall ... clause to see the verification error. The second error is a result of the verifier not being able to establish a termination condition for the Pass 1 loop. Proving this is left to someone who understands the tool better.

Pekka
  • 3,529
  • 27
  • 45
0

I am not sure if I got the question. But if for list 1,2,3,5,6 and the missing number is 4, then the missing number can be found in O(n) by: (n+2)(n+1)/2-(n+1)n/2

EDIT: sorry, I guess I was thinking too fast last night. Anyway, The second part should actually be replaced by sum(list), which is where O(n) comes. The formula reveals the idea behind it: for n sequential integers, the sum should be (n+1)*n/2. If there is a missing number, the sum would be equal to the sum of (n+1) sequential integers minus the missing number.

Thanks for pointing out the fact that I was putting some middle pieces in my mind.

Codism
  • 5,928
  • 6
  • 28
  • 29
  • 1
    I do not, at first glance see how this would work. In your case n=5 and the formulera will be fixed, no matter what number in it was missing. – sisve Oct 19 '09 at 04:49
  • Simon: could you now please remove the down vote according to my edit? – Codism Oct 19 '09 at 18:14
0

As Stephen C smartly pointed out, the answer must be a number smaller than the length of the array. I would then find the answer by binary search. This optimizes the worst case (so the interviewer can't catch you in a 'what if' pathological scenario). In an interview, do point out you are doing this to optimize for the worst case.

The way to use binary search is to subtract the number you are looking for from each element of the array, and check for negative results.

Emilio M Bumachar
  • 2,532
  • 3
  • 26
  • 30
0

I like the "guess zero" apprach. If the numbers were random, zero is highly probable. If the "examiner" set a non-random list, then add one and guess again:

LowNum=0
i=0
do forever {
  if i == N then leave /* Processed entire array */
  if array[i] == LowNum {
     LowNum++
     i=0
     }
   else {
     i++
   }
}
display LowNum

The worst case is n*N with n=N, but in practice n is highly likely to be a small number (eg. 1)

NealB
  • 16,670
  • 2
  • 39
  • 60
0

Well done Ants Aasma! I thought about the answer for about 15 minutes and independently came up with an answer in a similar vein of thinking to yours:

#define SWAP(x,y) { numerictype_t tmp = x; x = y; y = tmp; }
int minNonNegativeNotInArr (numerictype_t * a, size_t n) {
    int m = n;
    for (int i = 0; i < m;) {
        if (a[i] >= m || a[i] < i || a[i] == a[a[i]]) {
            m--;
            SWAP (a[i], a[m]);
            continue;
        }
        if (a[i] > i) {
            SWAP (a[i], a[a[i]]);
            continue;
        }
        i++;
    }
    return m;
}

m represents "the current maximum possible output given what I know about the first i inputs and assuming nothing else about the values until the entry at m-1".

This value of m will be returned only if (a[i], ..., a[m-1]) is a permutation of the values (i, ..., m-1). Thus if a[i] >= m or if a[i] < i or if a[i] == a[a[i]] we know that m is the wrong output and must be at least one element lower. So decrementing m and swapping a[i] with the a[m] we can recurse.

If this is not true but a[i] > i then knowing that a[i] != a[a[i]] we know that swapping a[i] with a[a[i]] will increase the number of elements in their own place.

Otherwise a[i] must be equal to i in which case we can increment i knowing that all the values of up to and including this index are equal to their index.

The proof that this cannot enter an infinite loop is left as an exercise to the reader. :)

Paul Hsieh
  • 1,466
  • 7
  • 9
0

Here's an answer in Java that does not modify the input and uses O(N) time and N bits plus a small constant overhead of memory (where N is the size of the list):

int smallestMissingValue(List<Integer> values) {
    BitSet bitset = new BitSet(values.size() + 1);
    for (int i : values) {
        if (i >= 0 && i <= values.size()) {
            bitset.set(i);
        }
    }
    return bitset.nextClearBit(0);
}
Dave L.
  • 43,907
  • 11
  • 63
  • 62
0
def solution(A):

index = 0
target = []
A = [x for x in A if x >=0]

if len(A) ==0:
    return 1

maxi = max(A)
if maxi <= len(A):
    maxi = len(A)

target = ['X' for x in range(maxi+1)]
for number in A:
    target[number]= number

count = 1
while count < maxi+1:
    if target[count] == 'X':
        return count
    count +=1
return target[count-1] + 1

Got 100% for the above solution.

Angelo
  • 61
  • 1
  • 8
0

1)Filter negative and Zero

2)Sort/distinct

3)Visit array

Complexity: O(N) or O(N * log(N))

using Java8

public int solution(int[] A) {
            int result = 1;
    boolean found = false;
    A = Arrays.stream(A).filter(x -> x > 0).sorted().distinct().toArray();
    //System.out.println(Arrays.toString(A));
    for (int i = 0; i < A.length; i++) {
        result = i + 1;
        if (result != A[i]) {
            found = true;
            break;
        }
    }
    if (!found && result == A.length) {
        //result is larger than max element in array
        result++;
    }
    return result;
}
0

An unordered_set can be used to store all the positive numbers, and then we can iterate from 1 to length of unordered_set, and see the first number that does not occur.

int firstMissingPositive(vector<int>& nums) {

    unordered_set<int> fre;
    // storing each positive number in a hash.
    for(int i = 0; i < nums.size(); i +=1)
    {
        if(nums[i] > 0)
            fre.insert(nums[i]);
     }

    int i = 1;
    // Iterating from 1 to size of the set and checking 
    // for the occurrence of 'i'

    for(auto it = fre.begin(); it != fre.end(); ++it)
    {
        if(fre.find(i) == fre.end())
            return i;
        i +=1;
    }

    return i;
}
Mohit Anand
  • 11
  • 2
  • 6
0

Solution through basic javascript

var a = [1, 3, 6, 4, 1, 2];

function findSmallest(a) {
var m = 0;
  for(i=1;i<=a.length;i++) {
    j=0;m=1;
    while(j < a.length) {
      if(i === a[j]) {
        m++;
      }
      j++;
    }
    if(m === 1) {
      return i;
    }
  }
}

console.log(findSmallest(a))

Hope this helps for someone.

Manoj
  • 2,000
  • 2
  • 16
  • 21
0

With python it is not the most efficient, but correct

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import datetime

# write your code in Python 3.6

def solution(A):
    MIN = 0
    MAX = 1000000
    possible_results = range(MIN, MAX)

    for i in possible_results:
        next_value = (i + 1)
        if next_value not in A:
            return next_value
    return 1

test_case_0 = [2, 2, 2]
test_case_1 = [1, 3, 44, 55, 6, 0, 3, 8]
test_case_2 = [-1, -22]
test_case_3 = [x for x in range(-10000, 10000)]
test_case_4 = [x for x in range(0, 100)] + [x for x in range(102, 200)]
test_case_5 = [4, 5, 6]
print("---")
a = datetime.datetime.now()
print(solution(test_case_0))
print(solution(test_case_1))
print(solution(test_case_2))
print(solution(test_case_3))
print(solution(test_case_4))
print(solution(test_case_5))
smentek
  • 2,820
  • 1
  • 28
  • 32
0
def solution(A):
    A.sort()
    j = 1
    for i, elem in enumerate(A):
        if j < elem:
            break
        elif j == elem:
            j += 1
            continue
        else:
            continue
    return j
orfeu
  • 1
0

this can help:

0- A is [5, 3, 2, 7];
1- Define B With Length = A.Length;                            (O(1))
2- initialize B Cells With 1;                                  (O(n))
3- For Each Item In A:
        if (B.Length <= item) then B[Item] = -1                (O(n))
4- The answer is smallest index in B such that B[index] != -1  (O(n))
Hamed
  • 150
  • 2
  • 16