Find missing number from 4 billion (again)

Question

It seems that a question that has been asked a lot of times is about to detect a missing number among 4 billion numbers.
The recomended approach appears to be to use a bitset (when memory constraints are part of the problem).
An example post is this:find-an-integer-not-among-four-billion-given-ones and I could also link to more here in SO.

My problem is the following: The bitset approach seems to implicitely asume that the numbers are non-negative. As an example in the post I linked to, there is this code snippet in Java:

int radix = 8; 
byte[] bitfield = new byte[0xffffffff/radix]; 
void F() throws FileNotFoundException{ 
    Scanner in = new Scanner(new FileReader("a.txt")); 
    while(in.hasNextInt()){ 
        int n = in.nextInt(); 
        bitfield[n/radix] |= (1 << (n%radix)); 
    } 

    for(int i = 0; i< bitfield.lenght; i++){ 
        for(int j =0; j<radix; j++){ 
            if( (bitfield[i] & (1<<j)) == 0) System.out.print(i*radix+j); 
        } 
    } 
}

But in Java all the integers are signed. As a result the code as posted will break for negative numbers. This int n = in.nextInt(); can return a negative.

So I guess somehow my confusion is about 2 parts on this kind of problem/exercise:
1) Can a bitset be used to represent negative numbers? How?
2) Is the solution to the problem somehow related to a specific programming language? E.g. in C++ where there are unsigned numbers I guess one could accept the assumption that the range is non-negative.

Could someone please help me understand this?

What prevents you from simply treating the numbers as `long`, and adding 2^31 to make them all positive? — Oliver Charlesworth, Apr 16 '12 at 09:30
@OliCharlesworth:If I did that, won't I get same bit index for 2 numbers (a positive and negative)?How would I then find the missing one?The corresponding bit index on scanning will be set to true but only one would be present — Cratylus, Apr 16 '12 at 09:44
I'm suggesting adding an offset to *all* values, not just the negative ones. — Oliver Charlesworth, Apr 16 '12 at 09:46
@OliCharlesworth:I see what you mean.But in this case I am correct.The post I linked to has a broken code.Right? — Cratylus, Apr 16 '12 at 09:48

score 4 · Accepted Answer · answered Apr 16 '12 at 09:35

4

try

long n = in.nextInt() & 0xFFFFFFFFL; 
bitfield[(int) (n/radix)] |= (1 << (n%radix));

or use

final int bits = 3; 

bitfield[(int) (n >>> bits)] |= (1 << (n & ((1 << bits) -1)));

Later

System.out.print((int) (i*radix+j));

You may find that using and int[] or long[] is slightly faster than using a byte[]

answered Apr 16 '12 at 09:35

Peter Lawrey

525,659
79
751
1,130

When you downcast `bitfield[(int) (n/radix)]` can't we end up with negative number? – Cratylus Apr 16 '12 at 09:43
If you have a range of `[0, 2^32)` and you divide it by 8, you get a range of `[0, 2^29)` or a maximum around 512 million. – Peter Lawrey Apr 16 '12 at 09:46
+1:I see now.I can't grasp the solution with the `bits=3`.I assume that even with this aproach the `n` is calculated as the aproach 1?Also why would the `int[]` be faster that an `byte[]`? – Cratylus Apr 16 '12 at 09:51
On 32/64 systems, the memory word size is 32/64 bit and to extract a byte it has to read the word, mask out the byte you want to change and then replace that byte in the word etc. i.e. rather simular to what you have to do to change on bit in a byte. The uses of bits is faster because shifting and masking is faster than division and modulus. Compare to reading text from a file or writing to the console its a trivial difference, so its for you own knowledge really. – Peter Lawrey Apr 16 '12 at 09:54
So `in.nextInt() & 0xFFFFFFFFL;` guarantees that the sign bit is always zeroed out when you assign to long?That is how you end up with [0, 2^32)? – Cratylus Apr 16 '12 at 10:03

score 3 · Answer 2 · answered Apr 16 '12 at 09:30

3

Well there's the obvious solution: Since we know that every number has a specific range [a, b) you can just add the lower boundary to all numbers to get guaranteed positive numbers.

Now that doesn't work for arbitrarily long numbers, but then that's usually not a problem

answered Apr 16 '12 at 09:30

Voo

29,040
11
82
156

`add the lower boundary to all numbers to get guaranteed positive numbers`.In the range [a,b) where it is [-2.147.483.648 , 2,147,483,648) what is the lower boundary that you point out? – Cratylus Apr 16 '12 at 09:37
1

@user384706 Well quite obviously -2^31 (you even used the same notation?). Since there are no unsigned numbers in java you'll have to use the next larger size. If you add -2^31 to the integer range you end up with [0, 2^32) just as expected. – Voo Apr 16 '12 at 09:43
So you are saying translate to [0, 2^32).I understand now what you mean.But what do you mean that `doesn't work for arbitrarily long numbers, but then that's usually not a problem`? Could you please explain more on this point? – Cratylus Apr 16 '12 at 09:47
@user384706 If you don't have a lower boundary you can't add it to all numbers. I.e. assume your range are the integers (in the mathematical sense), there's no "lowest integer" which you could add to make sure all your numbers are positive. But then your proposed problem doesn't make much sense if your range isn't limited, so that's no problem there. – Voo Apr 16 '12 at 10:07

Find missing number from 4 billion (again)

2 Answers2

Linked