Find two missing numbers

Question

We have a machine with O(1) memory and we want to pass n numbers (one by one) in the first pass, and then we exclude the two numbers and we will pass n-2 numbers to the machine.

write an algorithm that finds missing numbers.

are the numbers supposed to be a series? [1,....,n] and you need to find the missing element? — amit, Apr 18 '12 at 22:12
I'm sorry for a dumb question, but what is O(1) memory? Is it memory which can be randomly accessed in a constant time? — Dmytro Shevchenko, Apr 18 '12 at 22:13
Do you get the numbers in the same order? Might it be, that you get 5 numbers, and can store 3 of them, and then you get 500 numbers, and may store again 3 of them? — user unknown, Apr 18 '12 at 22:14
@amin k, By `integer` did you meant `int` in c++?[I mean 32 bit integer as constant variable]. — Saeed Amiri, Apr 18 '12 at 22:24
possible duplicate of [Easy interview question got harder: given numbers 1..100, find the missing number(s)](http://stackoverflow.com/questions/3492302/easy-interview-question-got-harder-given-numbers-1-100-find-the-missing-numbe). Worth reading, thorough treatment. — jpalecek, Apr 18 '12 at 22:27
@jpalecek That's not a duplicate. The OP said (in a comment) that the numbers aren't in a range. — sepp2k, Apr 18 '12 at 22:30
@saeed amiri:no,u assumed that n! can be saved by O(1) memory but...it is not optimal — amin k, Apr 18 '12 at 22:38

score 11 · Accepted Answer · answered Apr 19 '12 at 06:43

11

It can be done with O(1) memory.

You only need a few integers to keep track of some running sums. The integers do not require log n bits (where n is the number of input integers), they only require 2b+1 bits, where b is the number of bits in an individual input integer.

When you first read the stream add all the numbers and all of their squares, i.e. for each input number, n, do the following:

sum += n
sq_sum += n*n

Then on the second stream do the same thing for two different values, sum2 and sq_sum2. Now do the following maths:

sum - sum2 = a + b
sq_sum - sq_sum2 = a^2 + b^2

(a + b)(a + b) = a^2 + b^2 + 2ab
(a + b)(a + b) - (a^2 + b^2) = 2ab
(sum*sum - sq_sum) = 2ab

(a - b)(a - b) = a^2 + b^2 - 2ab
               = sq_sum - (sum*sum - sq_sum) = 2sq_sum - sum*sum
sqrt(2sq_sum - sum*sum) = sqrt((a - b)(a - b)) = a - b
((a + b) - (a - b)) / 2 = b
(a + b) - b = a

You need 2b+1 bits in all intermediate results because you are storing products of two input integers, and in one case multiplying one of those values by two.

answered Apr 19 '12 at 06:43

Running Wild

2,951
18
15

Assuming the numbers are distinct positive integers [which is a valid input], then the largest number is at least `n`, which requires `logn` bits to represent - thus the answer is still `O(logn)` space, since **`b` itself is linear in `logn`**. Also: The OP did mention in comments **the numbers are not necesseraly from a certain range**. I agree it is `O(1)` if we can assume integers are at most 32 bits [or any other constant number...] – amit Apr 19 '12 at 07:33
p.s. this assumption [constant number of bits] is contradicting the `O(1)` incapability because then the statement "input is not limited" is not true, there are finite number of possible answers if ints have a constant number of bits. Without this assumption, the `O(1)` infeasibility stands. – amit Apr 19 '12 at 08:13
1

To say that this problem can't be solved in constant space is to say that we can't hold a single integer from the input in memory at a time, in which case the problem is just being silly. Since we clearly must have this much memory, call it M, it is within the O(1) constraint to require kM memory, and for this solution k is about 10. – Running Wild Apr 19 '12 at 14:38
i guess that function of sq_sum2 is enogh,because i guess Equation of x^2+Y^2=c has a unique answer in integer domain...but i cant prove this problem – amin k Apr 19 '12 at 20:19
1

@amink, it's not true that x^2 + y^2 = c has a unique answer in integers. For example, if c is 4225, then (x,y) can be any of (16,63), (25, 60), (33,56), (39,52) not counting their permutations. – Chandranshu Aug 07 '13 at 05:13

score 3 · Answer 2 · answered Apr 18 '12 at 22:13

3

Assuming the numbers are ranging from 1..N and 2 of them are missing - x and y, you can do the following:

Use Gauss formula: sum = N(N+1)/2

sum - actual_sum = x + y

Use product of numbers: product = 1*2..*N = N!

product - actual_product = x * y

Resolve x,y and you have your missing numbers.

In short - go through the array and sum up each element to get the actual_sum, multiply each element to get actual_product. Then resolve the two equations for x an y.

answered Apr 18 '12 at 22:13

BrokenGlass

158,293
28
286
335

1

sum of N numbers cannot be stored in O(1) memory. – Karoly Horvath Apr 18 '12 at 22:14
This is O(logn) memory, at best. – amit Apr 18 '12 at 22:15
1

I agree, but I think this is what the interviewer *meant* to ask - I don't think you can do better. – BrokenGlass Apr 18 '12 at 22:16
2

He also indicated [in comments] the numbers are not ranged [1,...,n]. I believe it's a kind of a trick question to show the interviewer you understand architecture limitations. – amit Apr 18 '12 at 22:18
@amit how do you apply the O() notation to memory? How did you calculate it's O(logn)? I'm feeling dumb, please give me an article to read, or explain shortly. – Dmytro Shevchenko Apr 18 '12 at 22:18
@amit You're given the numbers twice, so you can build the sum, product, xor or whatever you need on the first pass, then compute them again on the second one and set up a pair of equations to solve. – Pablo Apr 18 '12 at 22:20
@BrokenGlass The way I understood it, they'll give you the all the numbers once, but you can't use memory to store them all, only checksums (like you have done), and then you're given the list again with two numbers missing... so I'd say you're on the right track ;) – Pablo Apr 18 '12 at 22:21
1

@Shedal: the sum of n positive integers is at least `n`, which needs `logn` bits to be represented. – amit Apr 18 '12 at 22:21
2

@Shedal The same way you apply it to runtime. If, for some constants `c` and `N`, an algorithm consumes at most `c*f(n)` bytes of memory when processing input of size `n` for all `n > N`, the algorithm's memory consumption is in `O(f(n))`. – sepp2k Apr 18 '12 at 22:22
n! is too big for memory usage – amin k Apr 18 '12 at 22:27
@KarolyHorvath By O(1) the interviewer probably meant constant size memory. @amink Instead of multiplying, calculate the sum of squares in the second run. Then you have `X + Y` and `X^2 + Y^2`. – Hindol Apr 19 '12 at 02:48
@Hindol: yes, that's what I meant too. anything you wanted to tell/suggest? – Karoly Horvath Apr 19 '12 at 06:58
@amink What `Running Wild` answered, I also meant exactly that, :). – Hindol Apr 20 '12 at 03:07
@hindol : i guess that function of sq_sum2 is enogh,because i guess Equation of x^2+Y^2=c has a unique answer in integer domain...but i cant prove this problem.....but i feel the Running wild answer is better than others,what s your idea? – amin k Apr 20 '12 at 12:56
@amink Yes, not only `x^2+Y^2=c` you also have `x+y=d`. Replace y with d-x in the former and you'll get x,y belongs to {m,n} (say). Also note x>=0, y>=0. Further assume without any loss of generality x – Hindol Apr 25 '12 at 07:37

score 3 · Answer 3 · answered Apr 18 '12 at 22:14

3

It cannot be done with O(1) memory.

Assume you have a constant k bits of memory - then you can have 2^k possible states for your algorithm.

However - input is not limited, and assume there are (2^k) + 1 possible answers for (2^k) + 1 different problem cases, from piegeonhole principle, you will return the same answer twice for 2 problems with different answers, and thus your algorithm is wrong.

answered Apr 18 '12 at 22:14

amit

175,853
27
231
333

1

But if numbers are in integer range may be constant number of 32 bit integers can be assumed in O(1) [I say this by his c++ tag, and seems it's not pute CS question] – Saeed Amiri Apr 18 '12 at 22:19
1

@SaeedAmiri: If such an assumption exists - it should be explicit, rather then implicit in these type of questions I think. – amit Apr 18 '12 at 22:21
I agreed, so I asked, even by such an assumption I couldn't come up to any algorithm :) – Saeed Amiri Apr 18 '12 at 22:29
Anyway your proof is not true, e.g backtracking algorithm usually are P-SPACE, e.g n space, but their possible states can be 2^2^2^..... – Saeed Amiri Apr 18 '12 at 22:36
@SaeedAmiri Why is it contradicting? P-Space allows linear, quadric or any polynomial amount of space, let it be `p(n)`, thus you can have `2^p(n)` amount of states. can you give a concrete counter-example where this solution fails? Or can you give an algorithm with 2^2^2^...n states that uses polynomial memory? – amit Apr 18 '12 at 22:38
No we can have 2^2^2^..... states, you should prove your claim, your contradiction is simply wrong by algorithms which working by permutations, they are usually Omega(n!) and O(2^P(n)) does not belong to Omega(n!). Also non-deterministic algorithms, all of them are contradiction to your proof. Anywway I mean you can't say we have 2^k possible states. – Saeed Amiri Apr 18 '12 at 22:42
@SaeedAmiri: why? `2^(nlogn) = Theta(n!)`. I still don't agree on the claim that we can have more then 2^k states, but let's agree you have `O(1)` states for `O(1)` memory - so there is a constant C which is bigger, and then the rest of the solution holds. – amit Apr 18 '12 at 22:47
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/10248/discussion-between-saeed-amiri-and-amit) – Saeed Amiri Apr 18 '12 at 22:50
Is finding k missing numbers becomes tougher? – niks Jan 29 '16 at 09:59

score 2 · Answer 4 · answered Apr 19 '12 at 06:42

The following came to my mind as soon as I finished reading the question. But the answers above suggest that it is not possible with O(1) memory or that there should be a constraint on the range of numbers. Tell me if my understanding of the question is wrong. Ok, so here goes

You have O(1) memory - which means you have constant amount of memory.

When the n numbers are passed to you 1st time, just keep adding them in one variable and keep multiplying them in another. So at the end of 1st pass you have the sum and product of all the numbers in 2 variables S1 and P1. You have used 2 variable till now (+1 if you reading the numbers in memory).

When the (n-2) numbers are passed to you the second time, do the same. Store the sum and product of the (n-2) numbers in 2 other variables S2 and P2. You have used 4 variables till now (+1 if you reading the numbers in memory).

If the two missing numbers are x and y, then

x + y = S1 - S2
x*y = P1/P2;

You have two equations in two variables. Solve them.

So you have used a constant amount of memory (independent of n).

This is similar to the approach suggested by @BrokenGlass - read the comments there and see why it is not `O(1)` memory. Note that this is not even linear in `n` - since the product of all elements is O(`n!`) [assuming unique elements], and you need `O(log(n!)) = O(nlogn)` to represent it. — amit, Apr 19 '12 at 07:40

score 1 · Answer 5 · answered Jun 21 '16 at 19:41

void Missing(int arr[], int size)
{
  int xor = arr[0]; /* Will hold xor of all elements */
  int set_bit_no;  /* Will have only single set bit of xor */
  int i;
  int n = size - 2;
  int x = 0, y = 0;

  /* Get the xor of all elements in arr[] and {1, 2 .. n} */
  for(i = 1; i < size; i++)
    xor ^= arr[i];  
  for(i = 1; i <= n; i++)
    xor ^= i;   

  /* Get the rightmost set bit in set_bit_no */
  set_bit_no = xor & ~(xor-1);

  /* Now divide elements in two sets by comparing rightmost set
   bit of xor with bit at same position in each element. */
  for(i = 0; i < size; i++)
  {
    if(arr[i] & set_bit_no)
      x = x ^ arr[i]; /*XOR of first set in arr[] */
    else
      y = y ^ arr[i]; /*XOR of second set in arr[] */
  }
  for(i = 1; i <= n; i++)
  {
    if(i & set_bit_no)
      x = x ^ i; /*XOR of first set in arr[] and {1, 2, ...n }*/
    else
      y = y ^ i; /*XOR of second set in arr[] and {1, 2, ...n } */
  }

  printf("\n The two repeating missing elements are are %d & %d ", x, y);
}

score 0 · Answer 6 · answered Sep 05 '18 at 22:24

0

Please look at the solution link below. It explains an XOR method. This method is more efficient than any of the methods explained above. It might be the same as Victor above, but there is an explanation as to why this works.

Solution here

answered Sep 05 '18 at 22:24

thenakulchawla

5,024
7
30
42

score 0 · Answer 7 · answered Jun 07 '20 at 18:09

Here is the simple solution which does not require any quadratic formula or multiplication:

Let say B is the sum of two missing numbers.

The set of two missing numbers will be one from: (1,B-1),(2,B-1)...(B-1,1)

Therefore, we know that one of those two numbers will be less than or equal to the half of B.

We know that we can calculate the B (sum of both missing number).

So, once we have B, we will find the sum of all numbers in the list which are less than or equal to B/2 and subtract that from the sum of (1 to B/2) to get the first number. And then, we get the second number by subtracting first number from B. In below code, rem_sum is B.

public int[] findMissingTwoNumbers(int [] list, int N){
    if(list.length == 0 || list.length != N - 2)return new int[0];

    int rem_sum = (N*(N + 1))/2;
    for(int i = 0; i < list.length; i++)rem_sum -= list[i];

    int half = rem_sum/2;
    if(rem_sum%2 == 0)half--; //both numbers cannot be the same
    int rem_half = getRemHalf(list,half);

    int [] result = {rem_half, rem_sum - rem_half};
    return result;
}
private int getRemHalf(int [] list, int half){
    int rem_half = (half*(half + 1))/2;
    for(int i = 0; i < list.length; i++){
        if(list[i] <= half)rem_half -= list[i];
    }
    return rem_half;
}

Find two missing numbers

7 Answers7

Linked

Related