33

This problem was asked to me in Amazon interview -

Given a array of positive integers, you have to find the smallest positive integer that can not be formed from the sum of numbers from array.

Example:

Array:[4 13 2 3 1]
result= 11 { Since 11 was smallest positive number which can not be formed from the given array elements }


What i did was :

  1. sorted the array
  2. calculated the prefix sum
  3. Treverse the sum array and check if next element is less than 1 greater than sum i.e. A[j]<=(sum+1). If not so then answer would be sum+1

But this was nlog(n) solution.

Interviewer was not satisfied with this and asked a solution in less than O(n log n) time.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
user3187810
  • 333
  • 1
  • 3
  • 6
  • 1
    Are you saying that the interviewer asked for a O(logn) solution? That's obviously not possible because you have to look at each array value once, which would take at least O(n). – interjay Jan 12 '14 at 17:23
  • 4
    Probably need to be more specific here: Smallest integer greater than zero which can not be created by summing any combination of the elements of the array, perhaps? – DavidO Jan 12 '14 at 17:24
  • He asked for a solution lesser than nlog(n) – user3187810 Jan 12 '14 at 17:26
  • Yes it was smallest positive number – user3187810 Jan 12 '14 at 17:28
  • 1
    Are the array elements all positive integers? Can there be duplicates? – interjay Jan 12 '14 at 17:29
  • Yes all array elements are positive integers – user3187810 Jan 12 '14 at 17:30
  • @interjay Yes elements may be repeated – user3187810 Jan 12 '14 at 17:33
  • 1
    Does the problem's spec guarantee a maximum possible integer value substantially less than INT_MAX? – DavidO Jan 12 '14 at 17:35
  • 1
    Sketch of a solution: Start with an empty red-black tree. For each element `j` of the input array: traverse the tree and for each node `n`, add `j + n` to the tree if not already in the tree, then add `j` to the tree. When completed, traverse the tree checking to see if the current node is equal to the last node + 1. If not, you have found the solution. If you reach the end of the tree, the solution is the last leaf's value + 1. There may be a better data structure considering the amount of inserts, though. – Conspicuous Compiler Jan 12 '14 at 17:36
  • @DavidO Yes Value will fit within interger range i.e. 10^9 – user3187810 Jan 12 '14 at 17:39
  • 2
    Isn't this coincidently very similar to this question that was asked yesterday? http://stackoverflow.com/questions/21060873/minimum-sum-that-cant-be-obtained-from-a-set – Abhishek Bansal Jan 12 '14 at 17:41
  • @ConspicuousCompiler Will your solution work for repeated elements ? – user3187810 Jan 12 '14 at 17:43
  • @ConspicuousCompiler That solution is O(n^2 * logn), you are adding O(n^2) elements to a tree and each addition is O(logn). – interjay Jan 12 '14 at 17:45
  • [Pretty much identical (closed) question](http://stackoverflow.com/questions/21039634/backtracking-algorithm-postage-stamp). – Bernhard Barker Jan 12 '14 at 17:45
  • Can the result of the algorithm be 0? That is, are you considering 0 a positive integer? Similarly, can 0 appear in the array? – Apriori Jan 12 '14 at 19:23
  • 1
    @Rich From the example, 0 isn't the smallest number, 11 is, so I think we can safely assume that it's not a valid answer. – Bernhard Barker Jan 12 '14 at 19:31
  • @Dukeling thank you, yes you're right, I didn't consider the example long enough. – Apriori Jan 12 '14 at 20:11
  • @Rich Given a more thorough description of the problem I've found elsewhere, 0 is always included in the numbers that can be generated, because it is legal to select no elements, which results in a zero. ie: Given `{ 1 3 5 7 }`, selecting `{ }` (the empty range) returns `0`. So it's never an impossible number. – DavidO Jan 13 '14 at 03:36
  • @interjay Quite right. Didn't evaluate my solution very well. Cheers! – Conspicuous Compiler Jan 15 '14 at 08:44

5 Answers5

55

There's a beautiful algorithm for solving this problem in time O(n + Sort), where Sort is the amount of time required to sort the input array.

The idea behind the algorithm is to sort the array and then ask the following question: what is the smallest positive integer you cannot make using the first k elements of the array? You then scan forward through the array from left to right, updating your answer to this question, until you find the smallest number you can't make.

Here's how it works. Initially, the smallest number you can't make is 1. Then, going from left to right, do the following:

  • If the current number is bigger than the smallest number you can't make so far, then you know the smallest number you can't make - it's the one you've got recorded, and you're done.
  • Otherwise, the current number is less than or equal to the smallest number you can't make. The claim is that you can indeed make this number. Right now, you know the smallest number you can't make with the first k elements of the array (call it candidate) and are now looking at value A[k]. The number candidate - A[k] therefore must be some number that you can indeed make with the first k elements of the array, since otherwise candidate - A[k] would be a smaller number than the smallest number you allegedly can't make with the first k numbers in the array. Moreover, you can make any number in the range candidate to candidate + A[k], inclusive, because you can start with any number in the range from 1 to A[k], inclusive, and then add candidate - 1 to it. Therefore, set candidate to candidate + A[k] and increment k.

In pseudocode:

Sort(A)
candidate = 1
for i from 1 to length(A):
   if A[i] > candidate: return candidate
   else: candidate = candidate + A[i]
return candidate

Here's a test run on [4, 13, 2, 1, 3]. Sort the array to get [1, 2, 3, 4, 13]. Then, set candidate to 1. We then do the following:

  • A[1] = 1, candidate = 1:
    • A[1] ≤ candidate, so set candidate = candidate + A[1] = 2
  • A[2] = 2, candidate = 2:
    • A[2] ≤ candidate, so set candidate = candidate + A[2] = 4
  • A[3] = 3, candidate = 4:
    • A[3] ≤ candidate, so set candidate = candidate + A[3] = 7
  • A[4] = 4, candidate = 7:
    • A[4] ≤ candidate, so set candidate = candidate + A[4] = 11
  • A[5] = 13, candidate = 11:
    • A[5] > candidate, so return candidate (11).

So the answer is 11.

The runtime here is O(n + Sort) because outside of sorting, the runtime is O(n). You can clearly sort in O(n log n) time using heapsort, and if you know some upper bound on the numbers you can sort in time O(n log U) (where U is the maximum possible number) by using radix sort. If U is a fixed constant, (say, 109), then radix sort runs in time O(n) and this entire algorithm then runs in time O(n) as well.

Hope this helps!

Kerem
  • 2,867
  • 24
  • 35
templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • 1
    It should be `candidate = candidate + A[i]` in the `else`, without the `-1`. This is exactly the same algorithm as given by OP, but the explanation is very helpful. – interjay Jan 12 '14 at 18:00
  • this goes to nlogn solution..i knewed this one but could you provide anything better than this one ? – user3187810 Jan 12 '14 at 18:00
  • 1
    @user3187810- This solution is pretty fast - it runs in no worse than O(n log n) time and possibly a lot better if you can sort the integers using something like radix sort. – templatetypedef Jan 12 '14 at 18:02
  • 1
    @interjay: I updated the answer. I didn't realize when I was writing this that it ended up being identical to the OP's answer. Now that I realize this, I think the answer is still useful in that it provides a justification for the answer and also shows how to speed it up (namely, improving the sorting step). If you think this isn't necessary, though, I can delete this answer. – templatetypedef Jan 12 '14 at 18:04
  • I agree, you should leave it here. I upvoted it for the explanation. – interjay Jan 12 '14 at 18:07
  • @templatetypedef there exist some O(n) solution which interviewer wanted to know.. there are many nlog(n) solution to this – user3187810 Jan 12 '14 at 18:09
  • 6
    @user3187810- If the integers have some fixed upper bound (say, 10^9), you can sort them in time O(n) by using counting sort or radix sort. That would then drop the total runtime to O(n). – templatetypedef Jan 12 '14 at 18:10
  • 1
    If the numbers in the array are randomly generated, a statistically significant improvement can be made by simply checking if 1 exists before performing the rest of the algorithm. – neeKo Jan 12 '14 at 21:15
  • How do we know that A[k]+candidate cannot be made using the A[0] through A[k]? – Ekalavya Jan 04 '19 at 13:17
  • We haven't proved that A[0]+...+A[k-1] < candidate. We only know that no subset of A[0], ... A[k-1] adds up to candidate. – Ekalavya Jan 05 '19 at 01:53
  • Suppose A[0]+...+A[k-1] > candidate. In principle, I might drop several elements of this sum and add A[k] -- that could conceivably equal candidate. No? Your argument applies only if I drop a single element and add A[k]. – Ekalavya Jan 05 '19 at 04:45
  • @Ekalavya Wait, disregard my previous argument. The strong loop invariant is that (1) we can make any number less than `candidate` using A[0] through A[k-1], and (2) we can't make any number at or above `candidate` using A[0] through A[k-1]. So now suppose we could make `candidate` + A[k] using A[0] through A[k]. The only way to do this would be to use A[k], since we can't make anything at or above `candidate` using A[0] through A[k-1]. So if we then remove A[k] from that summation, whatever's left has to be a subset of A[0] through A[k-1] adding up to `candidate`, which is impossible. – templatetypedef Jan 05 '19 at 05:05
  • Now it is correct. Perhaps you should change your original answer -- if possible. – Ekalavya Jan 05 '19 at 12:46
9

Use bitvectors to accomplish this in linear time.

Start with an empty bitvector b. Then for each element k in your array, do this:

b = b | b << k | 2^(k-1)

To be clear, the i'th element is set to 1 to represent the number i, and | k is setting the k-th element to 1.

After you finish processing the array, the index of the first zero in b is your answer (counting from the right, starting at 1).

  1. b=0
  2. process 4: b = b | b<<4 | 1000 = 1000
  3. process 13: b = b | b<<13 | 1000000000000 = 10001000000001000
  4. process 2: b = b | b<<2 | 10 = 1010101000000101010
  5. process 3: b = b | b<<3 | 100 = 1011111101000101111110
  6. process 1: b = b | b<<1 | 1 = 11111111111001111111111

First zero: position 11.

Dave
  • 7,460
  • 3
  • 26
  • 39
  • 2
    Note that this is linear time IF the bitvector operations are constant time, which they might not be. – Dave Jan 12 '14 at 21:34
  • 1
    To the best of my knowledge, there aren't any computers that support bitwise operations on arbitrary-width numbers in constant time. This is definitely a cool idea, but I don't think it's really O(n). – templatetypedef Jan 12 '14 at 21:58
  • 1
    @templatetypedef: Fair point. OP answered in comments that the integers were guaranteed to be in the range of [1,10^9], so a sufficiently large bitvector to occupy that entire space could be reserved in constant time at the start. Even without that allowance, doubling reserved size every time allocated space was exceeded should constrain you to O(lg n) allocations. – Conspicuous Compiler Jan 15 '14 at 08:54
  • 1
    @DaveGalvin Is `>>` a shift? Cause that's a shift right not a shift left. Even if it is a shift left, I must not be understanding something, cause in your step 3: `1|8192|1` does not equal 8209. – Jonathan Mee Jan 09 '15 at 14:59
  • @JonathanMee as said in answer, index 1 is at left... If you want to convert that with low bit at right like with integer arithmetic (and you have large integers), it is something like Smalltalk code: (#(4 13 2 3 1) inject: 0 into: [:b :k | b bitOr: (b << k bitOr: (1<<(k-1)))]) bitInvert lowBit 11 – aka.nice Apr 20 '15 at 10:07
  • 1
    @JonathanMee I had written a mirror-universe version of the algorithm! Amazing that nobody else caught that or mentioned it. It's correct now. Thanks! – Dave Nov 01 '15 at 13:55
7

Consider all integers in interval [2i .. 2i+1 - 1]. And suppose all integers below 2i can be formed from sum of numbers from given array. Also suppose that we already know C, which is sum of all numbers below 2i. If C >= 2i+1 - 1, every number in this interval may be represented as sum of given numbers. Otherwise we could check if interval [2i .. C + 1] contains any number from given array. And if there is no such number, C + 1 is what we searched for.

Here is a sketch of an algorithm:

  1. For each input number, determine to which interval it belongs, and update corresponding sum: S[int_log(x)] += x.
  2. Compute prefix sum for array S: foreach i: C[i] = C[i-1] + S[i].
  3. Filter array C to keep only entries with values lower than next power of 2.
  4. Scan input array once more and notice which of the intervals [2i .. C + 1] contain at least one input number: i = int_log(x) - 1; B[i] |= (x <= C[i] + 1).
  5. Find first interval that is not filtered out on step #3 and corresponding element of B[] not set on step #4.

If it is not obvious why we can apply step 3, here is the proof. Choose any number between 2i and C, then sequentially subtract from it all the numbers below 2i in decreasing order. Eventually we get either some number less than the last subtracted number or zero. If the result is zero, just add together all the subtracted numbers and we have the representation of chosen number. If the result is non-zero and less than the last subtracted number, this result is also less than 2i, so it is "representable" and none of the subtracted numbers are used for its representation. When we add these subtracted numbers back, we have the representation of chosen number. This also suggests that instead of filtering intervals one by one we could skip several intervals at once by jumping directly to int_log of C.

Time complexity is determined by function int_log(), which is integer logarithm or index of the highest set bit in the number. If our instruction set contains integer logarithm or any its equivalent (count leading zeros, or tricks with floating point numbers), then complexity is O(n). Otherwise we could use some bit hacking to implement int_log() in O(log log U) and obtain O(n * log log U) time complexity. (Here U is largest number in the array).

If step 1 (in addition to updating the sum) will also update minimum value in given range, step 4 is not needed anymore. We could just compare C[i] to Min[i+1]. This means we need only single pass over input array. Or we could apply this algorithm not to array but to a stream of numbers.

Several examples:

Input:       [ 4 13  2  3  1]    [ 1  2  3  9]    [ 1  1  2  9]
int_log:       2  3  1  1  0       0  1  1  3       0  0  1  3

int_log:     0  1  2  3          0  1  2  3       0  1  2  3
S:           1  5  4 13          1  5  0  9       2  2  0  9
C:           1  6 10 23          1  6  6 15       2  4  4 13
filtered(C): n  n  n  n          n  n  n  n       n  n  n  n
number in
[2^i..C+1]:  2  4  -             2  -  -          2  -  -
C+1:              11                7                5

For multi-precision input numbers this approach needs O(n * log M) time and O(log M) space. Where M is largest number in the array. The same time is needed just to read all the numbers (and in the worst case we need every bit of them).

Still this result may be improved to O(n * log R) where R is the value found by this algorithm (actually, the output-sensitive variant of it). The only modification needed for this optimization is instead of processing whole numbers at once, process them digit-by-digit: first pass processes the low order bits of each number (like bits 0..63), second pass - next bits (like 64..127), etc. We could ignore all higher-order bits after result is found. Also this decreases space requirements to O(K) numbers, where K is number of bits in machine word.

Evgeny Kluev
  • 24,287
  • 7
  • 55
  • 98
  • Can you please explain how this works for { 1 2 3 9 } and { 1 1 2 9 } – user3187810 Jan 13 '14 at 07:02
  • OK. Several examples added. – Evgeny Kluev Jan 13 '14 at 07:37
  • @EvgenyKluev I'm looking at your examples I can't figure out how your "S:" line is being calculated. In your description you mention prefix sum, but that is certainly not prefix sum. – Jonathan Mee Jan 09 '15 at 15:21
  • @JonathanMee: actually, "C" is prefix sum, not "S". "S[i]" is sum of values from input array having integer logarithm equal to "i". And "C[i]" is sum of values having integer logarithm less or equal to "i". – Evgeny Kluev Jan 09 '15 at 16:10
  • 1
    @EvgenyKluev Thanks for the explanation I now understand `C` and `S`. But I'm stuck again on Step 3. I don't understand what you mean by "next power of 2". – Jonathan Mee Jan 09 '15 at 16:28
  • @JonathanMee: Each element of array C contains sum of values having integer logarithm less or equal to some value. In other words, it considers values below some power of 2. Next power of 2 is obviously twice as large and I mean exactly this value. For example, if all values below 16 are representable and sum of all values below 16 is between 16 and 32, we have to look for input values between 16 and this sum + 1; but if this sum is 32 or greater, all numbers between 16 and 32 are representable, so it does not matter if there are any inputs between 16 and 32. – Evgeny Kluev Jan 09 '15 at 17:19
  • @EvgenyKluev OK, I understand the algorithm now. But I'm having trouble guaranteeing it's true. Is there a proof for "We already know *C*, which is sum of all numbers below 2*i*. If C >= 2*i* + 1 - 1, every number in this interval may be represented as sum of given numbers." How are we so certain of this? – Jonathan Mee Jan 10 '15 at 14:50
  • @JonathanMee: I also assume (but forget to say it explicitly) that all numbers below 2^i are "representable" (because algorithm inspects these intervals sequentially, from smallest to largest). The proof: choose any number between 2^i and C, then sequentially subtract from it all the numbers below 2^i in decreasing order; eventually you get either some number less than the last subtracted number or zero; in both cases this means that chosen number is "representable". – Evgeny Kluev Jan 10 '15 at 16:48
  • @EvgenyKluev If you subtract from a number until the result is negative, how can you be certain that you can form that number *exactly*? I know I'm running you around in circles on this. I've asked a question here for clarification: http://math.stackexchange.com/q/1099359/194115 – Jonathan Mee Jan 11 '15 at 02:55
  • @JonathanMee: Not until the result is negative, we stop one step earlier. If the result is zero, just add together all the subtracted numbers and you have the representation of chosen number. If the result is non-zero and less than the last subtracted number, this result is also less than 2^i, so it is "representable" and none of the subtracted numbers are used for its representation. When you add these subtracted numbers back, you have the representation of chosen number. – Evgeny Kluev Jan 11 '15 at 10:58
  • @EvgenyKluev There is a probable default question to this one [here](http://stackoverflow.com/q/27845636/2642059).) I have implemented your algorithm [there](http://stackoverflow.com/a/27905028/2642059). When you have time to look at it, I would like your blessing. I believe that your solution is the right one and I'm sad that you haven't gotten more upvotes. – Jonathan Mee Jan 12 '15 at 15:11
  • @JonathanMee: I've seen several similar questions lately. Probably it's because of recent contest on CodeChef. As for your implementation, I think copying all input numbers to hashmap is not a good idea. It greatly increases space needed for the algorithm. Note that for step 4 we do not have to process numbers in order of int_log (this is needed only for step 5). Also unordered_multimap seems too slow for this task; I'd prefer something like array of vectors. By the way, finding minimum value for each interval is good idea. Only it is better to be done on first (and only) pass. See my update. – Evgeny Kluev Jan 12 '15 at 17:32
  • @EvgenyKluev Although I believe finding the minimum number could be done in a single loop, however calculating the prefix sum in a single loop would be a mistake. If your small numbers were at the end of your input array you could end up reprocessing every sum you had done, taking you to *O(nlogn)* time. As far as wanting to use a `vector`... I believe using a single `vector` would be difficult, requiring updating all the indices each time an element was inserted. Using multiple `vector`s is possible but is unlikely to offer much space saving over a `multimap`. – Jonathan Mee Jan 12 '15 at 19:35
  • @JonathanMee: I'd prefer array of vectors (one vector for each bit position) not because of space (both vectors and hashes need too much space) but because vectors are (most likely) much faster. As for prefix sum, there is no problem here. Array S has one element for each bit position (like 64 elements), it is ready after you processed last number from input array, only after that you compute its prefix sums (you could even reuse the same array for prefix sums), no "reprocessing every sum" is needed. And other array for minimums. At the end you need only these 2 small arrays to get the result. – Evgeny Kluev Jan 12 '15 at 19:55
0

If you sort the array, it will work for you. Counting sort could've done it in O(n), but if you think in a practically large scenario, range can be pretty high.

Quicksort O(n*logn) will do the work for you:

def smallestPositiveInteger(self, array): 
    candidate = 1
    n = len(array)
    array = sorted(array)
    for i in range(0, n):
        if array[i] <= candidate:
            candidate += array[i]
        else:
            break
    return candidate
LITDataScience
  • 380
  • 5
  • 14
0

Assuming the Array is sorted:

  1. To solve this problem you have to came up with a very generic rule, so my theory is:

The sum of all array elements + 1 is unreachable

This's despite either (array sum + 1) is the smallest integer or not but it isn't reachable anyway, ex:

[1,2,3] => 7   // (sum + 1) is unreachable 
[1,5,7] => 14  // (sum + 1) is unreachable 
  1. Then we have the problem with gaps between the sequence of the numbers. and the important question will be: do we have the required elements to fill the gap?

and the answer is the same! The un-fillable gap is:

Sum + 1 // Already unreachable

solution in pseudocode (assume array is sorted):

result = 1 // in case array is empty or it doesn't start with "1"!

for i = 0 to the end of the array

    if result < array[i] : break;  // The gap is too wide to be filled

    result += number // sum + 1

return result
mwafi
  • 3,946
  • 8
  • 56
  • 83