1

When I constructed the mask to get the most significant bit in the 2's complement format, I have found the unexpected behavior.

To check whether the most significant bit is active or not in the signed 8-bit number, I could get the bit as follows.

byte value = -1;
long byteSignMask = 0b1000_0000;
value & byteSignMask;

The result is identical regardless I use 0b1000_0000 or 1L << 7 for byteSignMask. Actually following code passes.

long byteSign1 = 1L << 7;
long byteSign2 = 0b1000_0000;
// OK
assertEquals(byteSign1, byteSign2);

But I did for the int type; similarly, the outcome was expected.

long intSign1 = 1L << 31;
long intSign2 = 0b1000_0000_0000_0000_0000_0000_0000_0000;

// Fail: expected:<2147483648> but was:<-2147483648>
assertEquals(intSign1, intSign2);

Actually, they are different.

// intSign1 = 10000000000000000000000000000000
System.out.println("intSign1 = " + Long.toBinaryString(intSign1));
// intSign2 = 1111111111111111111111111111111110000000000000000000000000000000
System.out.println("intSign2 = " + Long.toBinaryString(intSign2));

It looks like the literal mask of the integer (intSign1) is left-padded with 1, while the shift operation does not cause such an effect.

Why is the integer expressed by the binary literal automatically left-padded with 1? Is there any official documentation describing this behavior?

Kai Sasaki
  • 667
  • 4
  • 13

1 Answers1

2

intSign2 you have here:

0b1000_0000_0000_0000_0000_0000_0000_0000

Is an int literal, not a long literal.

So you are saying "I want the int value represented by this bit pattern".

A single 1 followed by 31 0s represented as a 32 bit two's complement signed integer, int, is -2147483648. This value then gets "widened" to a long when you assigned to the long type variable intSign2. That's where the padded 1s came from.

To make it a long literal, you would have to add a L suffix:

0b1000_0000_0000_0000_0000_0000_0000_0000L

Why is byteSign2 padded with left 0s, while intSign2 is padded with left 1s?

When you specify a binary integer literal, and the number of bits you specify is fewer than the bit size of the data type, it will always get left-padded with 0s. So in the case of byteSign2, you said 0b1000_0000, which is actually equivalent to this binary literal:

0b0000_0000_0000_0000_0000_0000_1000_0000

In the case of intSign2, you specified the full 32 bits of int, so no padding is done at all.

The left-padded 1s are a result of the int-to-long conversion that took place. According to the language specification, this conversion works like this:

A widening conversion of a signed integer value to an integral type T simply sign-extends the two's-complement representation of the integer value to fill the wider format.

Because the conversion "sign-extends", it will pad 1s if the sign bit is 1, and 0s if the sign bit is 0 (this preserves the sign of the number, negative numbers remain negative, etc). For your binary literal, the sign bit is 1, so it pads 1s.

Sweeper
  • 213,210
  • 22
  • 193
  • 313
  • Almost right. The 1 is not a sign bit, but the int-literal overflows, which is why it is wrapped to the negative end of the values, as in `Integer.MAX_VALUE + 1 == Integer.MIN_VALUE`. – TreffnonX Aug 05 '20 at 06:17
  • @TreffnonX Does the interpretation matter at all? Will it make a difference anywhere? Also, I looked at the JLS, section 3.10.1, and it doesn't suggest that there is overflow. It simply states that `0b1000_0000_0000_0000_0000_0000_0000_0000` is the most negative binary integer literal. – Sweeper Aug 05 '20 at 06:22
  • Yes! It makes a difference if the values are anything but `0b1000_0000_0000_0000_0000_0000_0000_0000`, because e.g. the value of `0b1000_0000_0000_0000_0000_0000_0000_0001 == Integer.MIN_VALUE + 1`. If it was a sign bit, solely, then that value would equal `-1`. This is just how two's complement works. And that is why it makes a huge difference. – TreffnonX Aug 05 '20 at 06:56
  • @TreffnonX I think we are using different definitions for the sign bit here :) In two's complement, the MSB is also called the sign bit, the bit that represents -2^32. You probably mistook my use of "sign bit" as the sign bit in a sign-and-magnitude representation? Anyway, I deleted the sentence to avoid confusion. – Sweeper Aug 05 '20 at 07:02
  • 1
    Okay, I think you are right in that the MSB is also called sign bit in two's complement, though it makes it confusing to talk about. It is not a sign in terms of the decimal system, but a sign in terms of which number range is represented. – TreffnonX Aug 05 '20 at 07:08
  • Sorry, I do not fully grasp what you meant. Why is byteSign2 padded with left 0s, while intSign2 is padded with left 1s? – Kai Sasaki Aug 07 '20 at 02:04
  • See the edited answer. Is it clearer now? @KaiSasaki – Sweeper Aug 07 '20 at 02:15