9

I am currently trying to wrap my head around bitwise and bit shift operators in Java. Although they make sense to me in simplified toy examples (basically positive integers), my understanding falls apart as soon as negatives are involved, and in some other cases. I tried searching all over the Internet with two search engines and I even checked the Java specification. I can't find any source that properly describes how bitwise and bit shift operators work in Java.

One function in the Java standard library that is especially confusing to me is java.lang.Integer.toUnsignedLong(int). The source from OpenJdk is shown here (LGPLv2 with classpath exception), with an excerpt in the Javadoc:

/**
 * Converts the argument to a {@code long} by an unsigned
 * conversion.  In an unsigned conversion to a {@code long}, the
 * high-order 32 bits of the {@code long} are zero and the
 * low-order 32 bits are equal to the bits of the integer
 * argument.   
 */
public static long toUnsignedLong(int x) {
    return ((long) x) & 0xffffffffL;
}

According to the official documentation reproduced above, "the high-order 32 bits of the long are zero and the low-order 32 bits are equal to the bits of the integer argument." I do not see how this follows from the code inside the method body however.

When reading the method, the following is my train of thought for positive x:

  1. When the integer is cast to long, its sign bit / most significant bit is zero. As such, the long's sign bit / most significant bit is zero and the low-order bits are equal to those of the integer.
  2. As the long 0xffffffff has all ones in the lowest-order 4 bytes, and because only these bytes will have data in them, this mask has no effect and the correct result is returned.

When reading it in the context of a negative x however, my understanding falls apart:

  1. When the integer is cst to long, its sign bit / most significant bit is one. As such, the long's sign bit / most significant it is one and the low order bits are equal to those of the integer, except the most significant bit of the fourth least significant byte is zero when it is one in the integer.
  2. As the long 0xffffffff has all ones in the lowest-order 4 bytes and zero in the highest-order four bytes, it has the only effect of changing the sign bit on the long, and keeps the incorrect integer in the four least significant bits intact. As such, it returns a wrong answer from this method, where the sign bit of the integer is changed as it moves into the long.

When I test this method, however, I get results consistent with the Javadoc. I suspect that I am misunderstanding one or more fundamental points about bitwise operators in Java or its two's complement integer representation, and I hope that this question can clarify those points.

john01dav
  • 1,842
  • 1
  • 21
  • 40
  • 0xffffffff isn't all 1s. Each `ff` is one byte, and there are four such pairs. A long is 8 bytes. So it's 4 bytes (32 bits) of 0s and then 4 bytes of 1s. Does that help? – yshavit Apr 10 '19 at 04:01
  • @yshavit That's one misunderstanding (or, rather, brain fart -- I did know about what you are saying, but I didn't apply it for some reason), but it doesn't make everything add up. I have edited the question to correct that misunderstanding. – john01dav Apr 10 '19 at 04:03
  • 4
    When you cast your negative `int` to a `long`, the top 32 bits fill up with ones. Doing the `&` operation changes these back to zeroes. – Dawood ibn Kareem Apr 10 '19 at 04:13
  • Maybe this will help `System.out.printf("%s & %s = %s%n", Long.toBinaryString((long) x), Long.toBinaryString(0xffffffffL), Long.toBinaryString(((long)x) & 0xffffffffL));` – Elliott Frisch Apr 10 '19 at 04:15
  • @DawoodibnKareem If the top 32 bits fill up with ones, then the magnitude of the long will become truly massive, probably near its maximum value. In Java, a cast from an int to a long should maintain the same numeric value. As such, my understanding of what a cast is conflicts with your claim about the top 32 bits filling with ones. Can you clarify? – john01dav Apr 10 '19 at 04:25
  • The biggest bit is negative. It adds up to one more (in magnitude) than all the other bits put together. So the ones _mostly_ cancel each other out. – Dawood ibn Kareem Apr 10 '19 at 04:47
  • 3
    @john01dav Your write-up of the whole situation including what you already know and what you think would happen, all this is excellent. I wish every question here on Stack Overflow would be written in this style. – Roland Illig Apr 10 '19 at 05:28
  • not sure this deserves the language-lawyer tag, but good question nonetheless – CoffeeTableEspresso Apr 15 '19 at 16:43

1 Answers1

6

The bitwise operators work exactly as you would expect. They are strict bit-operators and do not consider semantics of bits at all.

Sometimes it is easiest to run through code using breakpoints. For your specific example, I converted the steps of the operation into atomic statements and printed the results with Long.toString.

int x = -57;

// step 1:
long xCast = (long) x;
System.out.println(Long.toString(xCast, 2)); // -1110011 - this is not the bitwise representation however.

long mask = 0xffffffffL;
System.out.println(Long.toString(mask, 2)); // 11111111111111111111111111111111

// step 2:
long result = ((long) x) & mask;
System.out.println(Long.toString(result, 2)); // 11111111111111111111111111000111

Step 1 is the primary reason the operation looks as it does. In Java, all (strictily numeric) values are signed (chars are unsigned). This means that, as you correctly stated, all highest bits are sign-bits. However the interesting part is what the rest of the bits do, if a number is negative. The following thread already covered the basics of the 'Two's complement': What is “2's Complement”? So does this wikipedia-page: https://en.wikipedia.org/wiki/Two%27s_complement

To make it short, in java, for integers:

int zero = 0; // == 0b00000000_00000000_00000000_00000000

int maxPositive = Integer.MAX_VALUE; // == 0b01111111_11111111_11111111_11111111

int minus1 = -1; // == 0b11111111_11111111_11111111_11111111

int minNegative = Integer.MIN_VALUE; // == 0b10000000_00000000_00000000_00000000

So the reason everything works out is because if the integer is negative, when it is cast, the entire upper 32 bits are converted to 1s, because otherwise the represented value of the number would change. effectively:

int x = 0b11111111_11111111_11111111_11000111;

is cast to:

long xCast = 0b11111111_11111111_11111111_11111111_11111111_11111111_11111111_11000111;

Because you as the developer expect the method to return only the initially set bits, you have to mask the upper bits out of the result. This is done in step 2.

So the answer to your example: The representation of non-floating values in Java are two's complement, and therefore, when smart-casting a value from int to long, the upper bits are filled up with 1s for negative numbers. Thus they have to be removed.

TreffnonX
  • 2,924
  • 15
  • 23