2

I saw the following question:

Given an array A of integers, find an integer k that is not present in A. Assume that the integers are 32-bit signed integers.

How to solve it?

Thank you

//// update -- This is the provided solution - but I don't think it is correct //// Consider a very simple hash function F(x)=x mod (n+1). we can build a bit-vector of length n+1 that is initialized to 0 and for every element in A, we set bit F(A[i]) to 1.Since there are only n elements in the array, we can find the missing element easily.

I think the above solution is wrong.

For example, A [ 2, 100, 4 ], then both 4 and 100 will match to the same place.

q0987
  • 34,938
  • 69
  • 242
  • 387
  • 1
    @leppie, Obviously, they are different. – q0987 Jan 22 '11 at 20:26
  • And is this homework or an interview question? – H H Jan 22 '11 at 20:27
  • @Henk, this is an interview question from the book Algorithms For Interviews and the solution is simply wrong. – q0987 Jan 22 '11 at 20:28
  • @leppie, the question above indicates that integers are signed integers. – q0987 Jan 22 '11 at 20:29
  • @q0987: And how does my example violate that rule? Does it even matter? – leppie Jan 22 '11 at 20:30
  • 1
    @leppie, I have to say you may read the question careful. your original solution has nothing to do with this question at all. – q0987 Jan 22 '11 at 20:31
  • 1
    @q0987: OK, I think I get it. You want an optimal solution for finding an integer not in a set/array A? If so, define the question better. I would be seriously upset if I was asked such a vague and ambiguous question in an interview. – leppie Jan 22 '11 at 20:34
  • @leppie, I kind of agree with you. but this question is a little hard to solve. Pls see the posted "solution" but I don't think it is correct. – q0987 Jan 22 '11 at 20:37
  • @q0987 If that's the case, then you're leaving out some crucial information. Perhaps you could tell us why you think it's wrong as that might reveal what you left out in the question? – moinudin Jan 22 '11 at 20:39
  • @q0987: Removed previous comments to prevent further confusion :) It could be quite tricky. – leppie Jan 22 '11 at 20:39
  • 2
    @q0987 Regarding the update: It mentions "the missing element". That is totally different to the question you asked. The solution you post finds the missing number from the range 1..n+1. Voting to close as not a real question. – moinudin Jan 22 '11 at 20:46
  • @q0987. if in solution from the book numbers that are bigger then (n+1) will be ignored, then, apparently you will get the correct solution (and get the missing number in the range [0..n+1] – Max Jan 22 '11 at 20:50
  • @q0987, even not ignored, but as marcog already written you will get the missing number from [0..n+1], the solution from book is correct. – Max Jan 22 '11 at 20:51
  • I think the above solution exactly answers the question: It finds *some* integer which is not in A. This integer happens to be in the range 0..n, but that renders neither the question nor the answer invalid. And it is the simplest O(n) solution I can think of -- the "max + 1" ideas do not work due to overflow issues. – Sven Marnach Jan 22 '11 at 21:20
  • @Sven, do you see the statement in the question "integers are 32-bit signed integers". Also given [0, 4, 200], the solution provided by the book will not return correct answer – q0987 Jan 22 '11 at 21:31
  • @q087, why solution from the book will not return correct answer? why??!! you wrote "then both 4 and 100 will match to the same place". And so? Go on with your considiration. why the answer is incorrect if 4 and 100 match the same place? – Max Jan 22 '11 at 21:40
  • @q0987: The more integers are mapped to the same hash value, the quicker you will find an answer. The only things that are important is that (a) there always is some hash value no element of the array maps to and (b) your hash function easily allows to find a value which maps to a given a hash. – Sven Marnach Jan 22 '11 at 21:54
  • Similar to: http://stackoverflow.com/questions/4696257/efficient-algorithm-to-find-first-available-name, although that has a stronger requirement on the answer. – Steve Jessop Jan 22 '11 at 22:56

3 Answers3

4

If I'm interpreting the question correctly, (find any integer not in A) and n is the number of elements in A, then the answer is correct as given.

The fact that the hash function can have collisions doesn't really matter; By the pigeonhole principle in reverse, there will be some bit in the bit vector that is not set - because it has more bits than there are elements in A. The corresponding value is an integer not in A. In the case where the hash function has collisions, there will actually be more than one bit not set, which works in favor of the algorithm's correctness rather than against it.

To illustrate, here's how your example would work out:

A = [2, 100, 4]
n = length(A) = 3
f(x) = x mod 4
map(f,A) = [2, 0, 0]

so the final bit vector will be:

[1,0,1,0]

From there, we can arbitrarily choose any integer corresponding to any 0 bit, which in this case is any odd number.

mokus
  • 3,490
  • 16
  • 22
  • and as I understood your solution is exactly the solution from the book. They are both, of course, correct. – Max Jan 22 '11 at 21:41
  • 2
    Yes, that's what I was getting at - the solution quoted in the question is exactly right, I was just illustrating it because the question presented `A=[2,100,4]` as a possible counterexample. – mokus Jan 22 '11 at 22:02
1
max (items) + 1

springs to mind. Of course it would fail if one of the elements was 2^31-1 and another was -(2^32).

Failing that, sort them and look for an interval.

Paul Johnson
  • 17,438
  • 3
  • 42
  • 59
  • good try. This was listed as one of the solution but it doesn't work always. - thx – q0987 Jan 22 '11 at 20:34
  • OK, some sequences are not going to have gaps, but assuming that there does exist a number not in the array (like, if its less than 16GB in size) then max+1 will work instead. Or have I missed something subtle? – Paul Johnson Jan 22 '11 at 20:37
1

Since there appear to be no upperbounds or other constraints:

for (int i = int.MinValue; i <= int.MaxValue; i++)
{
    if (! A.Contains(i))
    {
       // found it !
       break;
    }
}
H H
  • 263,252
  • 30
  • 330
  • 514