Finding a missing integer, given 4 billion unsorted integers

Question

I came across this interview questions and I was trying to understand how to approach this problem. I read this question on SO. I understood the approach of the author of the post, however I do not understand the approach suggested in the accepted answer. So I moved to this blog. According to this blog we can calculate the number of zeroes and ones at each of the bit positions and from that we can find out the missing number. But then for that the file should have 2^32-1 numbers which is greater than 4 billion. So that method should not work right? I am sure there is something wrong in my understanding, but I just can't figure out the missing link.

Just find the maximum and add one. Or the minimum and subtract one. — Niklas B., Apr 01 '14 at 00:57
@Blorgbeard OP is explicitely referencing that post, saying that he has problems understanding the top answer there. It's more of a follow-up than a duplicate — Niklas B., Apr 01 '14 at 01:02
The linked question has an explanation edited into the bottom of it. If you don't understand the explanation, write a comment on that question. — Blorgbeard, Apr 01 '14 at 01:03
Just in case the links don't live forever, it might be good to summarize the outline of the problem. — Edward, Apr 01 '14 at 01:06

score 1 · Answer 1 · answered Apr 01 '14 at 01:00

1

If you had a "complete" sequence of numbers from 0 to 2^N-1 then the number of bits set in each bit position would be equal (and equal to (2^N)/2).

If only one number is missing, then it's 1 bits correspond to the bit positions that are short one bit count.

Note that this only works for powers of 2, but possibly one can work out more complex formulae for "odd" counts.

answered Apr 01 '14 at 01:00

Hot Licks

47,103
17
93
151

1

But then the file has 4 billion integers and a complete sequence of numbers from 0 to 2^32-1 we will need the file to have 4294967295 numbers, right? So we have many missing numbers for that approach to work. – newbie Apr 01 '14 at 01:08
@newbie - Right (up to a point). Think first of the case with no missing numbers, then what it's like with one missing number. With no missing numbers all of the bit counts would be equal, but with one missing number some (but not all) will be short one count. – Hot Licks Apr 01 '14 at 01:13
And, in fact, for the 2^N-1 case you don't need to even count bits, but can simply calculate "parity" across the set. The resulting parity value will be the missing one. – Hot Licks Apr 01 '14 at 01:16
@HotLicks: that won't always work for the problem as stated. It says that the file contains *up to* 4 billion numbers. If it contained only 4, there are many "missing" numbers that might not be detected using the parity method. – Edward Apr 01 '14 at 02:08
I'm thinking that the counts should work for some range not equal to a power of 2 (minus 1), so long as ONLY one number in the range is missing. And actually one could make the parity scheme work, by artificially cycling through the numbers from the end of the range to the next power of 2 (or otherwise figuring out the correct fudge factor). – Hot Licks Apr 01 '14 at 02:27
(The original question is pretty ambiguous, and apparently has mutated over time.) – Hot Licks Apr 01 '14 at 02:28

score 0 · Answer 2 · answered Apr 01 '14 at 01:03

0

add the intergers up using long
subtract the result from (N+1)*(N+2)/2 where N is the number of integers in your file. The result is your missing number

Example:

file contains 1,3
sum = 4
N=2, so (N+1)(N+2)/2 = (2+1)(2+2)/2 = 6
6-sum=6-4=2
2 is your missing number

answered Apr 01 '14 at 01:03

datahaki

600
7
23

Of course, this scheme gets kind of ugly with 4G 32 bit numbers. – Hot Licks Apr 01 '14 at 01:14
why? the memory usage is in the order of 2 long (64-bit) variables. – datahaki Apr 01 '14 at 01:16
when N = 2^32 - 1 = 4294967295, then the sum is at most 9223372039002259456 while a long can hold 18446744073709551615 – datahaki Apr 01 '14 at 02:42
Yes, you're right, but the problem in the link is not saying the range of those integers is from 1 to 2^32 - 1, it only indicates that it is 4 billion integers. – Pham Trung Apr 01 '14 at 02:44

Finding a missing integer, given 4 billion unsorted integers

2 Answers2

Linked