Implicit left-padding of the binary literal in Java

Question

When I constructed the mask to get the most significant bit in the 2's complement format, I have found the unexpected behavior.

To check whether the most significant bit is active or not in the signed 8-bit number, I could get the bit as follows.

byte value = -1;
long byteSignMask = 0b1000_0000;
value & byteSignMask;

The result is identical regardless I use 0b1000_0000 or 1L << 7 for byteSignMask. Actually following code passes.

long byteSign1 = 1L << 7;
long byteSign2 = 0b1000_0000;
// OK
assertEquals(byteSign1, byteSign2);

But I did for the int type; similarly, the outcome was expected.

long intSign1 = 1L << 31;
long intSign2 = 0b1000_0000_0000_0000_0000_0000_0000_0000;

// Fail: expected:<2147483648> but was:<-2147483648>
assertEquals(intSign1, intSign2);

Actually, they are different.

// intSign1 = 10000000000000000000000000000000
System.out.println("intSign1 = " + Long.toBinaryString(intSign1));
// intSign2 = 1111111111111111111111111111111110000000000000000000000000000000
System.out.println("intSign2 = " + Long.toBinaryString(intSign2));

It looks like the literal mask of the integer (intSign1) is left-padded with 1, while the shift operation does not cause such an effect.

Why is the integer expressed by the binary literal automatically left-padded with 1? Is there any official documentation describing this behavior?

Sweeper · Accepted Answer · 2020-08-07T02:15:13.447

2

intSign2 you have here:

0b1000_0000_0000_0000_0000_0000_0000_0000

Is an int literal, not a long literal.

So you are saying "I want the int value represented by this bit pattern".

A single 1 followed by 31 0s represented as a 32 bit two's complement signed integer, int, is -2147483648. This value then gets "widened" to a long when you assigned to the long type variable intSign2. That's where the padded 1s came from.

To make it a long literal, you would have to add a L suffix:

0b1000_0000_0000_0000_0000_0000_0000_0000L

Why is byteSign2 padded with left 0s, while intSign2 is padded with left 1s?

When you specify a binary integer literal, and the number of bits you specify is fewer than the bit size of the data type, it will always get left-padded with 0s. So in the case of byteSign2, you said 0b1000_0000, which is actually equivalent to this binary literal:

0b0000_0000_0000_0000_0000_0000_1000_0000

In the case of intSign2, you specified the full 32 bits of int, so no padding is done at all.

The left-padded 1s are a result of the int-to-long conversion that took place. According to the language specification, this conversion works like this:

A widening conversion of a signed integer value to an integral type T simply sign-extends the two's-complement representation of the integer value to fill the wider format.

Because the conversion "sign-extends", it will pad 1s if the sign bit is 1, and 0s if the sign bit is 0 (this preserves the sign of the number, negative numbers remain negative, etc). For your binary literal, the sign bit is 1, so it pads 1s.

edited Aug 07 '20 at 02:15

answered Aug 05 '20 at 06:15

Sweeper

213,210
22
193
313

Almost right. The 1 is not a sign bit, but the int-literal overflows, which is why it is wrapped to the negative end of the values, as in `Integer.MAX_VALUE + 1 == Integer.MIN_VALUE`. – TreffnonX Aug 05 '20 at 06:17
@TreffnonX Does the interpretation matter at all? Will it make a difference anywhere? Also, I looked at the JLS, section 3.10.1, and it doesn't suggest that there is overflow. It simply states that `0b1000_0000_0000_0000_0000_0000_0000_0000` is the most negative binary integer literal. – Sweeper Aug 05 '20 at 06:22
Yes! It makes a difference if the values are anything but `0b1000_0000_0000_0000_0000_0000_0000_0000`, because e.g. the value of `0b1000_0000_0000_0000_0000_0000_0000_0001 == Integer.MIN_VALUE + 1`. If it was a sign bit, solely, then that value would equal `-1`. This is just how two's complement works. And that is why it makes a huge difference. – TreffnonX Aug 05 '20 at 06:56
@TreffnonX I think we are using different definitions for the sign bit here :) In two's complement, the MSB is also called the sign bit, the bit that represents -2^32. You probably mistook my use of "sign bit" as the sign bit in a sign-and-magnitude representation? Anyway, I deleted the sentence to avoid confusion. – Sweeper Aug 05 '20 at 07:02
1

Okay, I think you are right in that the MSB is also called sign bit in two's complement, though it makes it confusing to talk about. It is not a sign in terms of the decimal system, but a sign in terms of which number range is represented. – TreffnonX Aug 05 '20 at 07:08
Sorry, I do not fully grasp what you meant. Why is byteSign2 padded with left 0s, while intSign2 is padded with left 1s? – Kai Sasaki Aug 07 '20 at 02:04
See the edited answer. Is it clearer now? @KaiSasaki – Sweeper Aug 07 '20 at 02:15

Implicit left-padding of the binary literal in Java

1 Answers1