Packing bytes into a long with |= is giving unexpected results

Question

I am trying to concatenate my byte[] data into a long variable. But for some reason, the code is not working as I expected.

I have this byte array which maximum size will be 8 bytes which are 64 bits, the same size a Long variable has so I am trying to concatenate this array into the long variable.

public static void main(String[] args) {
    // TODO Auto-generated method stub

    byte[] data = new byte[]{
            (byte)0xD4,(byte)0x11,(byte)0x92,(byte)0x55,(byte)0xBC,(byte)0xF9
            };

    Long l = 0l;

    for (int i =0; i<6; i++){
        l |= data[i];           
        l <<=8;
        String lon = String.format("%064d", new BigInteger(Long.toBinaryString((long)l)));
        System.out.println(lon);
    }




}

The results are:

1111111111111111111111111111111111111111111111111101010000000000
1111111111111111111111111111111111111111110101000001000100000000
1111111111111111111111111111111111111111111111111001001000000000
1111111111111111111111111111111111111111100100100101010100000000
1111111111111111111111111111111111111111111111111011110000000000
1111111111111111111111111111111111111111111111111111100100000000

When the final result should be something like

111111111111111110101000001000110010010010101011011110011111001

which is 0xD4,0x11,0x92,0x55,0xBC,0xF9

swap the first two lines in the for loop, and no reason to use Long intead of long here. — maraca, Apr 04 '17 at 23:49

score 4 · Accepted Answer · edited May 23 '17 at 10:30

byte in Java is signed, and when you do long |= byte, the byte's value is promoted and the sign bit is extended, which essentially sets all those higher bits to 1 if the byte was a negative value.

You can do this instead:

 l |= (data[i] & 255)

To force it into an int and kill the sign before it's then promoted to a long. Here is an example of this happening.

Details

Prerequisite: If the term "sign bit" does not make sense to you, then you must read What is “2's Complement”? first. I will not explain it here.

Consider:

byte b = (byte)0xB5;
long n = 0l;

n |= b; // analogous to your l |= data[i]

Note that n |= b is exactly equivalent to n = n | b (JLS 15.26.2) so we'll look at that.

So first n | b must be evaluated. But, n and b are different types.

According to JLS 15.22.1:

When both operands of an operator &, ^, or | are of a type that is convertible (§5.1.8) to a primitive integral type, binary numeric promotion is first performed on the operands (§5.6.2).

Both operands are convertible to primitive integral types, so we consult 5.6.2 to see what happens next. The relevant rules here are:

Widening primitive conversion (§5.1.2) is applied to convert either or both operands as specified by the following rules:

...

Otherwise, if either operand is of type long, the other is converted to long.

...

Ok, well, n is long, so according to this b must be now be converted to long using the rules specified in 5.1.2. The relevant rule there is:

A widening conversion of a signed integer value to an integral type T simply sign-extends the two's-complement representation of the integer value to fill the wider format.

Well byte is a signed integer value and its being converted to a long, so according to this the sign bit (highest bit) is simply extended to the left to fill the space. So this is what happens in our example (imagine 64 bits here I'm just saving space):

b = (byte)0xB5                     10110101
b widened to long  111 ... 1111111110110101
n                  000 ... 0000000000000000
n | b              111 ... 1111111110110101

And so n | b evaluates to 0xFFFFFFFFFFFFFFB5, not 0x00000000000000B5. That is, when that sign bit is extended and the OR operation is applied, you've got all those 1's there essentially overwriting all of the bits from the previous bytes you've OR'd in, and your final results, then, are incorrect.

It's all the result of byte being signed and Java requiring long | byte to be converted to long | long prior to performing the calculation.

If you're unclear on the implicit conversions happening here, here is the explicit version:

n = n | (long)b;

Details of workaround

So now consider the "workaround":

byte b = (byte)0xB5;
long n = 0l;

n |= (b & 255);

So here, we evaluate b & 255 first.

So from JLS 3.10.1 we see that the literal 255 is of type int.

This leaves us with byte & int. The rules are about the same as above although we invoke a slightly different case from 5.6.2:

Otherwise, both operands are converted to type int.

So as per those rules byte must be converted to an int first. So in this case we have:

(byte)0xB5                                10110101
promote to int    11111111111111111111111110110101  (sign extended)
255               00000000000000000000000011111111
&                 00000000000000000000000010110101

And the result is an int, which is signed, but as you can see, now its a positive number and its sign bit is 0.

Then the next step is to evaluate n | the byte we just converted. So again as per the above rules the new int is widened to a long, sign bit extended, but this time:

b & 255                    00000000000000000000000010110101
convert to long  000 ... 0000000000000000000000000010110101
n                000 ... 0000000000000000000000000000000000
n | (b & 255)    000 ... 0000000000000000000000000010110101

And now we get the intended value.

The workaround works by converting b to an int as an intermediate step and setting the high 24 bits to 0, thus letting us convert that to a long without the original sign bit getting in the way.

If you're unclear on the implicit conversions happening here, here is the explicit version:

n = n | (long)((int)b & 255);

Other stuff

And also like maraca mentions in comments, swap the first two lines in your loop, otherwise you end up shifting the whole thing 8 bits too far to the left at the end (that's why your low 8 bits are zero).

Also I notice that your expected final result is padded with leading 1s. If that's what you want at the end you can start with -1L instead of 0L (in addition to the other fixes).

Packing bytes into a long with |= is giving unexpected results

1 Answers1

Details

Details of workaround

Other stuff

Linked

Related