Finding missing elements in an array

Question

Given you have an array A[1..n] of size n, it contains elements from the set {1..n}. However, two of the elements are missing, (and perhaps two of the array elements are repeated). Find the missing elements.

Eg if n=5, A may be A[5] = {1,2,1,3,2}; and so the missing elements are {4,5}

The approach I used was:

int flag[n] = {0};  
int i;  
for(i = 0; i < n; i++)  {  
  flag[A[i]-1] = 1;  
 }  

for(i = 0; i < n; i++)  {  
 if(!flag[i]) {  
    printf("missing: %d", (i+1));  
}

the space complexity comes to O(n). I feel this is a very kiddish and inefficient code. So could you please provide a better algo with better space and time complexity.

You can't do it in less than `O(n)` time, because you have to look at every element at least once. However there might be some solutions that use less memory. — Georg Schölly, Mar 09 '11 at 17:56
`flag[A[i]] = 1;` is a no-no! According to your post, each `A[i]` has a value between 1 and n; `flag[n]` is undefined (the indexes for flag go from `0` to `n-1`). Oh ... and tyou're trying to access `A[0]` which, according to your post, should not be considred; and you're failing to access `A[n]` (last `A` you look at is `A[n-1]`). OTOH, if `A` is defined as `A[5] = {1, 2, 1, 3, 2}` **there is an inconsistency in the description**. — pmg, Mar 09 '11 at 18:09
oops sorry.. though you should have understood its supposed to be flag[A[i]-1] = 1; thanks will modify it.. — letsc, Mar 09 '11 at 18:11
possible duplicates: http://stackoverflow.com/q/2348731/10396, and http://stackoverflow.com/q/2590165/10396 — AShelly, Mar 09 '11 at 18:28
Definite duplicate of Q2 in http://stackoverflow.com/questions/3492302/easy-interview-question-got-harder-given-numbers-1-100-find-the-missing-number — AShelly, Mar 09 '11 at 18:32
This problem has a similar feel to the ['Barrels of Pellets'](http://www.reocities.com/oosterwal/puzzle/pellets02q.html) puzzle. I think there is a fairly simple solution that runs in O(n) time and O(1) space that involves summing or hashing the elements of the array and comparing that value to the expected sum or hash of the set. The difference should tell you which elements are missing and which, if any, elements are repeated. — oosterwal, Mar 10 '11 at 01:47

score 15 · Accepted Answer · 2011-03-11T08:02:21.513

Theoretically,

It is possible to do in O(1) space (in RAM model, i.e. O(1) words) and O(n) time even with a read-only array.

Warning: Long post with some mathematics. If you are only interested in the code and not the algorithm/proof, skip to the code section. You would need to read some parts of the algorithm section to understand the code, though.

Algorithm

Assume the missing numbers are x and y.

There are two possibilities for the array:

1) One number is repeated three times, and the remaining numbers in the array appear exactly once.

For this case, the bucketed XOR trick will work.

Do a XOR of all elements of the array with 1,2,...,n.

You end up with z = x XOR y.

There is at least one bit of z which is non-zero.

Now differentiating the elements of the array based on that bit (two buckets) do a XOR pass through the array again.

You will end up with x and y.

Once you have the x and y, you can confirm if these are indeed the missing elements.

If it so happens that the confirmation step fails, then we must have the second case:

2) Two elements repeated exactly twice and the rest appearing exactly once.

Let the two repeated elements be a and b (x and y are the missing ones).

Warning: Math ahead.

Let S_k = 1^k + 2^k + .. + n^k

For instance S_1 = n(n+1)/2, S_2 = n(n+1)(2n+1)/6 etc.

Now we compute seven things:

T_1 = Sum of the elements of the array = S_1 + a + b - x - y.
T_2 = Sum of the squares = S_2 + a^2 + b^2 - x^2 - y^2
T_3 = Sum of cubes = S_3 + a^3 + b^3 - x^3 - y^3
T_4 = Sum of fourth powers = S_4 + a^4 + b^4 - x^4 - y^4
...
T_7 = Sum of seventh powers = S_7 + a^7 + b^7 - x^7 - y^7

Note, we can use O(1) words (intsead of one) to deal with the overflow issues. (I estimate 8-10 words will be enough).

Let Ci = T_i - S_i

Now assume that a,b,x,y are the roots of the 4th degree polynomial P(z) = z^4 + pz^3 + qz^2 + rz + s

Now we try to transform the above seven equations into four linear equations in p,q,r,s.

For instance, if we do 4th Eqn + p * 3rd Eqn + q* 2nd equation + r* 1st equation

we get

C4 + p*C3 + q*C2 + r*C1 = 0

Similarly we get

C5 + p*C4 + q*C3 + r*C2 + s*C1 = 0
C6 + p*C5 + q*C4 + r*C3 + s*C2 = 0
C7 + p*C6 + q*C5 + r*C4 + s*C3 = 0

These are four linear equations in p,q,r,s which can be solved by Linear Algebra techniques like Gaussian Elimination.

Note that p,q,r,s will be rationals and so can be computed only with integer arithmetic.

Now suppose we are given a solution p,q,r,s to the above set of equations.

Consider P(z) = z^4 + pz^3 + qz^2 + rz + s.

What the above equations are saying is basically

P(a) + P(b) - P(x) - P(y) = 0
aP(a) + bP(b) - xP(x) -yP(y) = 0
a^2 P(a) + b^2 P(b) - x^2 P(x) - y^2 P(y)  = 0
a^3 P(a) + b^3 P(b) - x^3 P(x) - y^3 P(y) = 0

Now the matrix

   1   1  -1 -1
   a   b   -x   -y
   a^2 b^2 -x^2 -y^2
   a^3 b^3 -x^3 -y^3

has the same determinant as the Vandermonde matrix and thus is invertible, if a,b,x,y are distinct.

Thus we must have that P(a) = P(b) = P(x) = P(y) = 0.

Now check which of 1,2,3,...,n are roots of x^4 + px^3 + qx^2 + rx + s = 0.

Thus this is a linear time constant space algorithm.

Code

I wrote the following C# (.Net 4.0) code and it seems to work for the few samples I tried... (Note: I didn't bother catering to case 1 above).

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using System.Numerics;

namespace SOManaged
{
    class Program
    {
        static void Main(string[] args)
        {
            ulong[] inp = {1,3,2,1,2};
            ulong[] inp1 = { 1,2,3,4,5,6,7,8,
                             9,10,11,13,14,15,
                             16,17,18,19,20,21,5,14};

            int N = 100000;
            ulong[] inp2 = new ulong[N];
            for (ulong i = 0; i < (ulong)N; i++)
            {
                inp2[i] = i+1;
            }
            inp2[122] = 44;
            inp2[419] = 13;

            FindMissingAndRepeated(inp);
            FindMissingAndRepeated(inp1);
            FindMissingAndRepeated(inp2);
        }

        static void FindMissingAndRepeated(ulong [] nums)
        {
            BigInteger[] C = new BigInteger[8];

            // Compute the C_i
            for (int k = 0; k < 8; k++)
            {
                C[k] = 0;
            }

            BigInteger i = 1;
            BigInteger n = 0;

            for (int j = 0; j < nums.Length; j++)
            {
                n = nums[j];
                i = j + 1;
                for (int k = 1; k < 8; k++)
                {
                    C[k] += i - n;
                    n = n * nums[j];
                    i = i * (j + 1);
                }
            }


            for (int k = 1; k <= 7; k++)
            {
                Console.Write("C[" + k.ToString() + "] = " + 
                               C[k].ToString() +", ");
            }
            Console.WriteLine();

            // Solve for p,q,r,s
            BigInteger[] pqrs = new BigInteger[4];
            BigInteger[] constants = new BigInteger[4];
            BigInteger[,] matrix = new BigInteger[4, 4];

            int start = 4;
            for (int row = 0; row < 4; row++ )
            {
                constants[row] = -C[start];

                int k = start-1;
                for (int col = 0; col < 4; col++)
                {
                    matrix[row, col] = C[k];
                    k--;
                }

                start++;
            }

            Solve(pqrs, matrix, constants, 4);

            for (int k = 0; k < 4; k++)
            {
                Console.Write("pqrs[" + k.ToString() + "] = " 
                               + pqrs[k].ToString() + ", ");
            }
            Console.WriteLine();

            // Find the roots.
            for (int k = 1; k <= nums.Length; k++)
            {
                BigInteger x = new BigInteger(k);
                BigInteger p_k = x * x * x* x + pqrs[0] * x* x * x 
                                 + pqrs[1] * x * x + pqrs[2] * x 
                                 + pqrs[3];

                if (p_k == 0)
                {
                    Console.WriteLine("Found: " + k.ToString());
                }
            }
        }

        // Solve using Cramer's method.
        // matrix * pqrs = constants.
        static void Solve(BigInteger[] pqrs, BigInteger[,] matrix, 
                          BigInteger[] constants, int n)
        {
            BigInteger determinant = Determinant(matrix, n);

            for (int i = 0; i < n; i++)
            {
                BigInteger[,] numerator = Replace(matrix, constants, n, i);
                BigInteger numDet = Determinant(numerator,4);
                pqrs[i] = numDet/ determinant;
            }
        }

        // Replace a column of matrix with constants.
        static BigInteger[,] Replace(BigInteger[,] matrix, 
                           BigInteger[] constants, int n, int col)
        {
            BigInteger[,] newMatrix = new BigInteger[n, n];
            for (int i = 0; i < n; i++)
            {
                for (int j = 0; j < n; j++)
                {
                    if (j != col)
                    {
                        newMatrix[i, j] = matrix[i, j];
                    }
                    else
                    {
                        newMatrix[i, j] = constants[i];
                    }
                }
            }

            return newMatrix;
        }

        // Recursively compute determinant for matrix.
        static BigInteger Determinant(BigInteger[,] matrix, int n)
        {
            BigInteger determinant = new BigInteger(0);
            int multiplier = 1;

            if (n == 1)
            {
                return matrix[0,0];
            }

            for (int i = 0; i < n; i++)
            {
                BigInteger [,] subMatrix = new BigInteger[n-1,n-1];
                int row = 0;
                for (int j=1; j < n; j++)
                {
                    int col = 0;
                    for (int k = 0; k < n ; k++)
                    {
                        if (k == i)
                        {
                            continue;
                        }
                        subMatrix[row,col] = matrix[j,k];
                        col++;
                    }
                    row++;
                }

                BigInteger subDeterminant = Determinant(subMatrix, n - 1);
                determinant += multiplier * subDeterminant * matrix[0,i];
                multiplier = -multiplier;
            }

            return determinant;
        }
    }
}

The output is

C[1] = 6, C[2] = 36, C[3] = 180, C[4] = 864, C[5] = 4116, C[6] = 19656, C[7] = 9
4380,
pqrs[0] = -12, pqrs[1] = 49, pqrs[2] = -78, pqrs[3] = 40,
Found: 1
Found: 2
Found: 4
Found: 5


C[1] = 15, C[2] = 407, C[3] = 9507, C[4] = 215951, C[5] = 4861515, C[6] = 108820
727, C[7] = 2424698067,
pqrs[0] = -53, pqrs[1] = 980, pqrs[2] = -7396, pqrs[3] = 18480,
Found: 5
Found: 12
Found: 14
Found: 22


C[1] = 486, C[2] = 189424, C[3] = 75861486, C[4] = 31342069984, C[5] = 130971109
69326, C[6] = 5492487308851024, C[7] = 2305818940736419566,
pqrs[0] = -600, pqrs[1] = 83183, pqrs[2] = -3255216, pqrs[3] = 29549520,
Found: 13
Found: 44
Found: 123
Found: 420

@Georg: Here is some data. For the example problem `1,1,2,2,3`: `c1 = 6 C2 = 36 c3 = 180 C4 = 864 C5 = 4116 C6 = 19656 C7 = 94380` Solving gives `p=-12, q= 49, r = -78, s= 40`. THe corresponding polynomial is `(z-1)(z-2)(z-4)(z-5)` — , Mar 10 '11 at 02:08
I used a calculator and this site: http://wims.unice.fr/wims/wims.cgi I currently don't have time to write a full program, unfortunately. — , Mar 10 '11 at 02:09
@Georg: I wrote some code and have edited the answer with it. Ideally it should be the other way round... Code requiring proof, not proof requiring code! :-) — , Mar 11 '11 at 07:57
@Moron: I've upvoted your answer long ago anyway. :) I'm quite impressed by your dedication to this answer. — Georg Schölly, Mar 11 '11 at 09:31
@Georg: Yeah I know you were the upvoter :-) It was an interesting challenge, but if you hadn't commented, I wouldn't have bothered writing the code. So, thanks! — , Mar 11 '11 at 14:28
@smartmuki: How about proper English grammar? It's just much nicer. See also this question on meta: http://meta.stackexchange.com/questions/56061/is-atrocious-english-a-possible-sign-of-rudeness — Georg Schölly, Mar 13 '11 at 10:22
When taking elements to the seventh-power above, how can you put any O(1) bound on the necessary word size when the size of the elements will grow with N? Where does the 8-10 estimate come from? Can you implement the algorithm using a bounded datatype (i.e. not `BigInteger`?) — Scott Wegner, Apr 22 '11 at 04:34
@Scott: By your own logic, how will you even maintain an index into the array in O(1) space? That is why I specified the Word RAM model. If n occupies O(1) words, then so does n^7. 8-10 comes from that fact that 1^7 + 2^7 + ... + n^7 is of the order n^8. — , Apr 22 '11 at 04:43
@Moron: Fair enough. I suppose the storage requirement for `n` is `O(log(n))` bits, so `n^7` would require `O(log(n^7)) = O(7*log(n)) = O(log(n))`. That is, if my math muscles are still working properly. — Scott Wegner, Apr 22 '11 at 05:08

score 8 · Answer 2 · edited May 23 '17 at 12:25

8

As @j_random_hacker pointed out, this is quite similar to Finding duplicates in O(n) time and O(1) space, and an adaptation of my answer there works here too. In pseudo-code:

for i := 1 to n
    while A[A[i]] != A[i] 
        swap(A[i], A[A[i]])
    end if
end for

for i := 1 to n
    if A[i] != i then 
        print i
    end if
end for

The first loop permutes the array so that if element x is present at least once, then one of those entries will be at position A[x].

Note that although it has a nested loop, it still runs in O(N) time - a swap only occurs if there is an i such that A[i] != i, and each swap sets at least one element such that A[i] == i, where that wasn't true before. This means that the total number of swaps (and thus the total number of executions of the while loop body) is at most N-1.

edited May 23 '17 at 12:25

Community

1
1

answered Apr 22 '11 at 04:23

caf

233,326
40
323
462

Since we are skipping the 0th index and iterating upto i=N, it seems we always need to extend given arr by 1 and copy first ele to new index. Tried for One Missing One Repeating case e.g. [4,3,5,2,2] and also for Two Missing Two Repeating case e.g. [5,3,5,2,2]. Also tried for One Missing case e.g. [4,3,5,2] based on your answer to [this](https://stackoverflow.com/questions/3492302/easy-interview-question-got-harder-given-numbers-1-100-find-the-missing-numbe?page=1&tab=votes#tab-top) question, and that too seems to need k+1 additional space, instead of just k. Or did I go wrong somewhere? – Chandan Hegde Apr 18 '19 at 15:40
@dd9chndn: This is pseudo-code using 1-based arrays (because that's how the problem was formulated). – caf Apr 19 '19 at 13:31

score 4 · Answer 3 · edited Mar 11 '11 at 07:58

4

Your solution isn't bad. Here's an alternative - Sort the list and iterate through it checking adjacent numbers. When there is a gap, print all the numbers in between. If k is the length of the array and n is the number to count to, we get O(k lg k + n) time, O(1) space

edited Mar 11 '11 at 07:58

templatetypedef

362,284
104
897
1,065

answered Mar 09 '11 at 17:52

dfb

13,133
2
31
52

That depends on the sorting algorithm, that usually it's already O(n), so probably it's more efficient the OP's code. – redent84 Mar 09 '11 at 17:55
@redent84: the OPs code is already optimal, any kind of sorting algo will make it slower. – Doc Brown Mar 09 '11 at 17:59
O(1) space if the input array is mutable - but then if the input array is mutable we can play tricks to make the questioner's approach O(1) space too, in O(k) time. – Steve Jessop Mar 09 '11 at 18:00
Depends on what optimal means, for low k and high n, this is more time efficient and it is always more space efficient.... – dfb Mar 09 '11 at 18:01
k can't be smaller than n - your k is n from the question, and your n is whatever number the higher of the two gaps actually is. So there is no "low k and high n". – Steve Jessop Mar 09 '11 at 18:02
@Steve - huh? The space is from the flag array. What's the trick? – dfb Mar 09 '11 at 18:03
@spinning_plate: use the input array as a flag array - they're the same size, after all. Do an initial pass to see if there's a 1, and remember the result. Then scan up the array. For each value, set it to 0, and set the index equal to its value to 1. But before writing 1, remember the value you're about overwrite and set it next. If that value is already 1 or 0, stop doing this and return to the next element in your scan. If an element reached in the scan is 1, just move on to the next one because you've already handled the value that was there. I think that's it. – Steve Jessop Mar 09 '11 at 18:06
Actually, you can avoid the initial pass looking for 1 by using n+1 as the "flag is set" value, provided n+1 is less than UINT_MAX (or whatever type the array holds). – Steve Jessop Mar 09 '11 at 18:11

score 1 · Answer 4 · answered Feb 02 '16 at 11:26

1

A sample code snippet for finding the missing elements without sorting the array below:

     public static void series(int[] arr) {
     for (int i = 0; i < arr.length; i++) {
        while (arr[i] != i + 1) {
            int jump = arr[arr[i] - 1];
            if (jump == arr[i]) {
                break;
            }
            arr[arr[i] - 1] = arr[i];
            arr[i] = jump;
        }
     }
     System.out.println("Missing number is ");
     for (int i = 0; i < arr.length; i++) {
        if (arr[i] != i + 1) {
            System.out.println(i + 1);
        } else {
            arr[i] = -1;
        }
     }

This code works for a series of numbers from 0 to N.

answered Feb 02 '16 at 11:26

karthikbv

183
1
4
13

I think we can replace the while loop condition `while ( arr[i] != i + 1 )` with `while ( arr[arr[i] - 1] != arr[i] )` and then get rid of the `if (jump == arr[i])` condition and directly do `swap ( arr[arr[i] - 1], arr[i] )` – Chandan Hegde Apr 19 '19 at 11:01
@Chandan Hegde yes the code can be reduced, but we have not defined the swap function here. So to avoid confusion, I have used the extra variable 'jump'. – karthikbv Dec 21 '19 at 02:29

Zimbabao · Answer 5 · 2011-03-09T18:50:22.477

1

This is bit qwirky As all your numbers are positive (by problem). I ll make the number at position i-1 a negetive if i is present in array.

int i;  
for(i = 0; i < n; i++)  {  
    A[abs(A[i])-1] = -1*abs(A[abs(A[i])-1]);
 }  

for(i = 0; i < n; i++)  {  
 if(A[i]>0) {  
    printf("missing: %d", i+1);  
}

Complexity O(n), no auxiliary array user, but destroys the input array.

edited Mar 09 '11 at 18:50

answered Mar 09 '11 at 18:03

Zimbabao

8,150
3
29
36

This algorithm doesn't work. Not only that, also it causes an array index out of bounds (and segmentation violation fault - if you are lucky). – redent84 Mar 09 '11 at 18:10
I tried for your input and it gave me correct answer. You mean this is not what your interviewers expected ? – Zimbabao Mar 09 '11 at 18:11
1

Any input different from a sequence starting at 1 and ending at N will cause the error. That's a bit risky! – redent84 Mar 09 '11 at 18:11
@redent84 : Yeah thats correct I just followed as per given in question :) – Zimbabao Mar 09 '11 at 18:13
@Zimbabao You can reset the sign of the array in the second for cycle without increasing complexity – xanatos Mar 09 '11 at 18:35
I took an array A[]= {4, 2, 1, 2, 4} and did a dry run. But maybe coz I made sum gross error or so, i recieved nothng. Jst tell me if it works, and I can try again.. – letsc Mar 09 '11 at 18:36
@smartmuki : Try no, I forgot to add some absolute calls – Zimbabao Mar 09 '11 at 18:51

Sanjeevakumar Hiremath · Answer 6 · 2011-03-11T00:30:44.053

Following is one of the approaches to identify all missing numbers when an array is known to contain only lumbers between 1 to n inclusive, without using any additional space. Time complexity is O(n).

Lets take a smallest number k such that it is not supposed to be in array therfore k = n+1 (lets call it an add factor).

first loop through each array and for every a[i] we will update a[a[i] - 1] += k; after this loop every array element contains two sets of information the number which was originally in the array element + k * (number of occurances of ith number in the array).

in the second loop you could find out how many repetations of ith number by doing an integer division of the number at each location by k. And original number at ith location would be a[i] % k;

Lets work through an example

A[5] = {1,2,1,3,2};

here (addfactor) k = 5 (array length) + 1 = 6

After fisrt loop array would look like if original element is m and occurances of ith number is O(i) resulting array element would be m + k * O(i) this element divide(integer) by k you'll get occurances of ith elemnent, and %k you'd get original array.

A = {1 + 6*2, 2 + 6*2, 1 + 6*1, 3+6*0 , 2+6*0 }
A = {13, 14, 7, 3, 2 }

Following is C# code which (I'm sorry, my C is little rusty its been a while.) could be ported to any language just by replacing Printf & scanfs.

    static void Main(string[] args)
    {
        int[] A = { 1, 2, 1, 3, 2 };
        PrintDuplicateAndMissing(A);
        Console.ReadLine();
    }

    static void PrintDuplicateAndMissing(int[] array)
    {
        int addfactor = array.Length + 1;
        for (int i = 0; i < array.Length; i++)
        {
            array[array[i] - 1] += addfactor; // -1 only if array contains from 1 to n. if it is 0 to n (this -1 is not required)
        }
        for (int i = 0; i < array.Length; i++)
        {
            if ( (array[i] / addfactor) == 0 )
                Console.WriteLine(string.Format("{0} is missing", i + 1)); // i + 1 only if array is 1 to n, if 0 to n then +1 is not required
            array[i] %= addfactor; //restore original content of the array
        }
    }

score 0 · Answer 7 · answered Nov 22 '13 at 21:15

As you have given an array of n size and find the missing number when it's in a sequence.

#include<stdio.h>
main()
{
print("value of n");
scan("%d",&n);
print("enter the elements");
for(i=0;i<n;i++)
scan("%d",&a[i]);
for(i=0;i<n;i++)
{
d1[i]=a[i+1]-a[i];
temp=d1[i];
d[i]=temp;
}
for(i=0;i<n;i++)
{
if(d[i]==d[i+1]
{
c=d[i];
break;
}
}
for(i=0;i<n;i++)
b[i]=a[0]+i*c;
for(i=0;i<n;i++)
{
miss=0;
for(j=0;j<n;j++)
{
if(b[i]!=a[j])
{
miss++;
}
if(miss==n)
print("missing no. %d",b[i]);
}
}

It would find the missing when its in sequence only.

xanatos · Answer 8 · 2011-03-09T18:33:16.707

Cycle each element 0...n-1.

x = abs(A[i]) (with i = 0...n-1);

A[x - 1] can be: 
> 0: we haven't checked the element, change its sign to negative:
    A[x - 1] = -A[x - 1]
< 0: we already found the same number

At the end of the cycle, pass each A[0...n-1]. The index of the positive elements + 1 is the missing numbers.

So if

y = abs(A[i]) > 0: i + 1 is missing.

In C#

var arr = new[] { 1, 2, 1, 2, 4 };

for (int i = 0; i < arr.Length; i++) {
    int x = Math.Abs(arr[i]);
    int y = arr[x - 1];
    if (y > 0) {
        arr[x - 1] = -arr[x - 1];
    }
}

for (int i = 0; i < arr.Length; i++) {
    int x = arr[i];
    if (x > 0) {
        Console.WriteLine("Missing {0}", i + 1);
    } else {
        arr[i] = -arr[i];
    }
}

And the array is as good as new.

I think his is wrong. I think he can access a negative index element. Try 2, 1, 1, 2, 4 — xanatos, Mar 09 '11 at 18:37

score 0 · Answer 9 · answered Mar 12 '12 at 02:53

As we know we are looking for elements between 1 to N Create a Hash set containing 1 to N.

foreach(int i in input)
{
   if(hashset.contains(i))
   {
      hashset.delete(i);
   }
}

return "remaining elements in Hashset.";

The remaining elements in Hashset are the missing elements.

Finding missing elements in an array

9 Answers9

Linked