9

I am trying to tackle this interview question: given an array of unique positive integers, find the smallest possible number to insert into it so that every integer is still unique. The algorithm should be in O(n) and the additional space complexity should be constant. Assigning values in the array to other integers is allowed.

For example, for an array [5, 3, 2, 7], output should be 1. However for [5, 3, 2, 7, 1], the answer should then be 4.

My first idea is to sort the array, then go through the array again to find where the continuous sequence breaks, but sorting needs more than O(n).

Any ideas would be appreciated!

Yassin Hajaj
  • 21,337
  • 9
  • 51
  • 89
maranic
  • 107
  • 4
  • Do you know the max value in your array? – dehasi Jun 10 '19 at 12:30
  • 1
    No, but finding that max value should be in O(n) and need constant space complexity – maranic Jun 10 '19 at 12:43
  • @maranic I can't understand why `[5, 3, 2, 7], the output should be 1` or `[5, 3, 2, 7, 1], the answer should then be 4.`. This might seem naive but it will help me if you can explain or elaborate further on it. Thanks. – mnm Jun 10 '19 at 13:02
  • 1
    @mnm We need to "find the smallest possible number to insert into it so that every integer is still unique". For `[5,3,2,7]`, the smallest possible number is 1, but for the second array, `1,2,3` are all present, so the next possible smalles number is 4. – maranic Jun 10 '19 at 13:06
  • @maranic thanks a lot for the explanation. Now, I understand it. Cheers! – mnm Jun 10 '19 at 13:08
  • 1
    @SurakofVulcan The only rule is that the values in the array should remain positive integers. – maranic Jun 10 '19 at 13:11
  • 1
    From the question description: "Assigning values in the array to other integers is allowed." This is *O(n) space, not constant.* – גלעד ברקן Jun 10 '19 at 13:16
  • 1
    @גלעדברקן Is it? It's just replacing the array no? – Yassin Hajaj Jun 10 '19 at 13:19
  • 1
    @גלעדברקן I should have added constant *additional* space, sorry – maranic Jun 10 '19 at 13:22
  • 2
    @גלעדברקן: no. The space of the input data structure doesn't count. –  Jun 10 '19 at 13:38
  • @YvesDaoust I think you are wrong. it definitely counts when we are using it to implement a solution. – גלעד ברקן Jun 10 '19 at 13:58
  • 1
    @YvesDaoust I thought the same but apparently we are wrong. See chapter 4 of [Papadimitriou](https://theory.cs.princeton.edu/complexity/book.pdf). The input is supposed to be read-only when computing space complexity. – fjardon Jun 10 '19 at 14:56
  • 1
    @fjardon: this is negotiable :-) Anyway, the OP has now specified *additional* space. –  Jun 10 '19 at 14:59
  • Apparently, array length plays a major role and it looks like the answer will never be above `array length + 1`. – nice_dev Jun 10 '19 at 15:26
  • it's called mex(min. excluded) – RiaD Jun 10 '19 at 23:33

7 Answers7

5

My attempt:

The array A is assumed 1-indexed. We call an active value one that is nonzero and does not exceed n.

  • Scan the array until you find an active value, let A[i] = k (if you can't find one, stop);

  • While A[k] is active,

    • Move A[k] to k while clearing A[k];
  • Continue from i until you reach the end of the array.

After this pass, all array entries corresponding to some integer in the array are cleared.

  • Find the first nonzero entry, and report its index.

E.g.

[5, 3, 2, 7], clear A[3]
[5, 3, 0, 7], clear A[2]
[5, 0, 0, 7], done

The answer is 1.

E.g.

[5, 3, 2, 7, 1], clear A[5],
[5, 3, 2, 7, 0], clear A[1]
[0, 3, 2, 7, 0], clear A[3],
[0, 3, 0, 7, 0], clear A[2],
[0, 0, 0, 7, 0], done

The answer is 4.

The behavior of the first pass is linear because every number is looked at once (and immediately cleared), and i increases regularly.

The second pass is a linear search.


A= [5, 3, 2, 7, 1]
N= len(A)

print(A)
for i in range(N):
    k= A[i]
    while k > 0 and k <= N:
        A[k-1], k = 0, A[k-1] # -1 for 0-based indexing
        print(A)

[5, 3, 2, 7, 1]
[5, 3, 2, 7, 0]
[0, 3, 2, 7, 0]
[0, 3, 2, 7, 0]
[0, 3, 0, 7, 0]
[0, 0, 0, 7, 0]
[0, 0, 0, 7, 0]

Update:

Based on גלעד ברקן's idea, we can mark the array elements in a way that does not destroy the values. Then you report the index of the first unmarked.

print(A)
for a in A:
    a= abs(a)
    if a <= N:
        A[a-1]= - A[a-1] # -1 for 0-based indexing
    print(A)

[5, 3, 2, 7, 1]
[5, 3, 2, 7, -1]
[5, 3, -2, 7, -1]
[5, -3, -2, 7, -1]
[5, -3, -2, 7, -1]
[-5, -3, -2, 7, -1]
  • In my example I don't. Where do you see that ? If the array was `[7, 3, 2, 5, 1]`, the 1 would clear it. –  Jun 10 '19 at 14:24
  • 1
    My answer performs the same idea, which is to *mark seen values*, just avoids using an additional pointer. – גלעד ברקן Jun 10 '19 at 14:25
  • 1
    @YassinHajaj: the answer is 1, as correctly reported by my solution. –  Jun 10 '19 at 14:26
  • 1
    I don't understand the downvotes, this works well and is O(n) – aka.nice Jun 10 '19 at 15:12
  • @aka.nice: yep. On second thoughts, it is easier to mark the array elements non-destructively like גלעד ברקן, playing wiht the sign, so that you just do with two forward loops. –  Jun 10 '19 at 15:17
  • I agree, it's very nice solution. Your solution is closer to mine (with the idea of swapping). – aka.nice Jun 10 '19 at 15:32
  • @aka.nice: I like these swaps as well, they are also non-destructive. –  Jun 10 '19 at 15:34
4

From the question description: "Assigning values in the array to other integers is allowed." This is O(n) space, not constant.

Loop over the array and multiply A[ |A[i]| - 1 ] by -1 for |A[i]| < array length. Loop a second time and output (the index + 1) for the first cell not negative or (array length + 1) if they are all marked. This takes advantage of the fact that there could not be more than (array length) unique integers in the array.

גלעד ברקן
  • 23,602
  • 3
  • 25
  • 61
  • Because the input space doesn’t count. Compare things like quickselect. If you could count this as O(n) space, then any actual auxiliary O(n) space wouldn’t change the space complexity… – Ry- Jun 10 '19 at 13:48
  • @Ry but we are reusing the space assigned to the input for the solution. That means the solution implementation is using O(n) space. – גלעד ברקן Jun 10 '19 at 13:49
  • 1
    Very good indeed. Maybe indicate that you assume indexing from 0 – Damien Jun 10 '19 at 14:53
1

I will use 1-based indexing.

The idea is to reuse input collection and arrange to swap integer i at ith place if its current position is larger than i. This can be performed in O(n).

Then on second iteration, you find the first index i not containing i, which is again O(n).

In Smalltalk, implemented in Array (self is the array):

firstMissing
    self size to: 1 by: -1 do: [:i |
        [(self at: i) < i] whileTrue: [self swap: i with: (self at: i)]].
    1 to: self size do: [:i |
        (self at: i) = i ifFalse: [^i]].
    ^self size + 1

So we have two loops in O(n), but we also have another loop inside the first loop (whileTrue:). So is the first loop really O(n)?

Yes, because each element will be swapped at most once, since they will arrive at their right place. We see that the cumulated number of swap is bounded by array size, and the overall cost of first loop is at most 2*n, the total cost incuding last seatch is at most 3*n, still O(n).

You also see that we don't care to swap case of (self at: i) > i and: [(self at:i) <= self size], why? Because we are sure that there will be a smaller missing element in this case.

A small test case:

| trial |
trial := (1 to: 100100) asArray shuffled first: 100000.
self assert: trial copy firstMissing = trial sorted firstMissing.
aka.nice
  • 9,100
  • 1
  • 28
  • 40
  • This works only if the size of the input is at least as large as the maximal value in the array. – fjardon Jun 10 '19 at 14:43
  • @fjardon No. I swap index `i` and index `self at: i` only if `(self at: i) < i`. I thus never write past `self size` (and could not in Smalltalk, such buffer overrun would cause an exception). – aka.nice Jun 10 '19 at 15:00
0

use this short and sweet algorithm:

A is [5, 3, 2, 7]
1- Define B With Length = A.Length;                            (O(1))
2- initialize B Cells With 1;                                  (O(n))
3- For Each Item In A:
        if (B.Length <= item) then B[Item] = -1                (O(n))
4- The answer is smallest index in B such that B[index] != -1  (O(n))
Hamed
  • 150
  • 2
  • 16
0

You could do the following.

  • Find the maximum (m), sum of all elements (s), number of elements (n)
  • There are m-n elements missing, their sum is q = sum(1..m) - s - there is a closed-form solution for the sum
  • If you are missing only one integer, you're done - report q
  • If you are missing more than one (m-n), you realize that the sum of the missing integers is q, and at least one of them will be smaller than q/(m-n)
  • You start from the top, except you will only take into account integers smaller than q/(m-n) - this will be the new m, only elements below that maximum contribute to the new s and n. Do this until you are left with only one missing integer.

Still, this may not be linear time, I'm not sure.

Surak of Vulcan
  • 348
  • 2
  • 11
0

EDIT: you should use the candidate plus half the input size as a pivot to reduce the constant factor here – see Daniel Schepler’s comment – but I haven’t had time to get it working in the example code yet.

This isn’t optimal – there’s a clever solution being looked for – but it’s enough to meet the criteria :)

  1. Define the smallest possible candidate so far: 1.
  2. If the size of the input is 0, the smallest possible candidate is a valid candidate, so return it.
  3. Partition the input into < pivot and > pivot (with median of medians pivot, like in quicksort).
  4. If the size of ≤ pivot is less than pivot itself, there’s a free value in there, so start over at step 2 considering only the < pivot partition.
  5. Otherwise (when it’s = pivot), the new smallest possible candidate is the pivot + 1. Start over at step 2 considering only the > pivot partition.

I think that works…?

'use strict';

const swap = (arr, i, j) => {
    [arr[i], arr[j]] = [arr[j], arr[i]];
};

// dummy pivot selection, because this part isn’t important
const selectPivot = (arr, start, end) =>
    start + Math.floor(Math.random() * (end - start));

const partition = (arr, start, end) => {
    let mid = selectPivot(arr, start, end);
    const pivot = arr[mid];
    swap(arr, mid, start);
    mid = start;

    for (let i = start + 1; i < end; i++) {
        if (arr[i] < pivot) {
            mid++;
            swap(arr, i, mid);
        }
    }

    swap(arr, mid, start);
    return mid;
};

const findMissing = arr => {
    let candidate = 1;
    let start = 0;
    let end = arr.length;

    for (;;) {
        if (start === end) {
            return candidate;
        }

        const pivotIndex = partition(arr, start, end);
        const pivot = arr[pivotIndex];

        if (pivotIndex + 1 < pivot) {
            end = pivotIndex;
        } else {
            //assert(pivotIndex + 1 === pivot);
            candidate = pivot + 1;
            start = pivotIndex + 1;
        }
    }
};

const createTestCase = (size, max) => {
    if (max < size) {
        throw new Error('size must be < max');
    }

    const arr = Array.from({length: max}, (_, i) => i + 1);
    const expectedIndex = Math.floor(Math.random() * size);
    arr.splice(expectedIndex, 1 + Math.floor(Math.random() * (max - size - 1)));

    for (let i = 0; i < size; i++) {
        let j = i + Math.floor(Math.random() * (size - i));
        swap(arr, i, j);
    }

    return {
        input: arr.slice(0, size),
        expected: expectedIndex + 1,
    };
};

for (let i = 0; i < 5; i++) {
    const test = createTestCase(1000, 1024);
    console.log(findMissing(test.input), test.expected);
}
Ry-
  • 218,210
  • 55
  • 464
  • 476
  • Or - on an input of length n you know the output is at most n+1, so pivot at n/2 etc. and do a binary search for the output value. Then T(n) = O(n) + T(n/2) so T(n) = O(n). – Daniel Schepler Jun 10 '19 at 21:05
  • @DanielSchepler: I’m not sure what you mean by a binary search compared to what this is already, but n/2 is a better choice of pivot, thanks. – Ry- Jun 10 '19 at 21:09
  • You were mentioning median-of-median partition selection. My idea was to partition based on possible output values not partition the list into rough halves "spatially". – Daniel Schepler Jun 10 '19 at 21:10
0

The correct method I almost got on my own, but I had to search for it, and I found it here: https://www.geeksforgeeks.org/find-the-smallest-positive-number-missing-from-an-unsorted-array/

Note: This method is destructive to the original data

Nothing in the original question said you could not be destructive.

I will explain what you need to do now.

The basic "aha" here is that the first missing number must come within the first N positive numbers, where N is the length of the array.

Once you understand this and realize you can use the values in the array itself as markers, you just have one problem you need to address: Does the array have numbers less than 1 in it? If so we need to deal with them.

Dealing with 0s or negative numbers can be done in O(n) time. Get two integers, one for our current value, and one for the end of the array. As we scan through, if we find a 0 or negative number, we perform a swap using the third integer, with the final value in the array. Then we decrement our end of an array pointer. We continue until our current pointer is past the end of the array pointer.

Code example:

while (list[end] < 1) {
   end--;
}
while (cur< end) {
   if (n < 1) {
      swap(list[cur], list[end]);
      while (list[end] < 1) {
         end--;
      }
   }
}

Now we have the end of the array, and a truncated array. From here we need to see how we can use the array itself. Since all numbers that we care about are positive, and we have a pointer to the position of how many of them there are, we can simply multiply one by -1 to mark that place as present if there was a number in the array there.

e.g. [5, 3, 2, 7, 1] when we read 3, we change it to [5, 3, -2, 7, 1]

Code example:

for (cur = 0; cur <= end; begin++) {
   if (!(abs(list[cur]) > end)) {
      list[abs(list[cur]) - 1] *= -1;
   }
}

Now, note: You need to read the absolute value of the integer in the position because it might be changed to be negative. Also note, if an integer is greater than your end of list pointer, do not change anything as that integer will not matter.

Finally, once you have read all the positive values, iterate through them to find the first one that is currently positive. This place represents your first missing number.

Step 1: Segregate 0 and negative numbers from your list to the right. O(n)
Step 2: Using the end of list pointer iterate through the entire list marking
        relevant positions negative. O(n-k)
Step 3: Scan the numbers for the position of the first non-negative number. O(n-k)
Space Complexity: The original list is not counted, I used 3 integers beyond that. So
        it is O(1)

One thing I should mention is the list [5, 4, 2, 1, 3] would end up [-5, -4, -2, -1, -3] so in this case, you would choose the first number after the end position of the list, or 6 as your result.

Code example for step 3:

for (cur = 0; cur < end; cur++) {
   if (list[cur] > 0) {
      break;
   }
}
print(cur);
Farzad Karimi
  • 770
  • 1
  • 12
  • 31
Chthonic One
  • 213
  • 3
  • 11