4

Java char is a 16 bit data type, but is it signed or unsigned when it comes to performing arithmetic on it?

Can you use it as an unsigned 16 bit integer in arithmetic?

For example, is the following correct?

char c1;
char c2;

int i = c1 << 16 | c2;

Or is it necessary to strip the sign-extended bits off c2 first?

(I am sure the answer to this is elsewhere, but there doesn't seem to be picked up by obvious searches).

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
rghome
  • 8,529
  • 8
  • 43
  • 62
  • Chars are unsigned but they are promoted to ints before bit shifting – Michael Feb 28 '19 at 10:58
  • https://stackoverflow.com/a/21089624/3977134 – r3dst0rm Feb 28 '19 at 10:59
  • @Michael I assume then that the promotion does not involve extension of the top bit (since there is no sign bit). – rghome Feb 28 '19 at 11:02
  • 1
    @r3dst0rm given that the second answer is essentially wrong in that question (which is pointed out in a comment), I think the issue needs clarifying (hence this question). – rghome Feb 28 '19 at 11:07

2 Answers2

8

char is unsigned. From JLS§4.2.1:

For char, from '\u0000' to '\uffff' inclusive, that is, from 0 to 65535

...but note that when you use any of the various mathematic operations on them (including bitwise operations and shift operations), they're widened to another type based on the type of the other operand, and that other type may well be signed:

  1. Widening primitive conversion (§5.1.2) is applied to convert either or both operands as specified by the following rules:

    • If either operand is of type double, the other is converted to double.

    • Otherwise, if either operand is of type float, the other is converted to float.

    • Otherwise, if either operand is of type long, the other is converted to long.

    • Otherwise, both operands are converted to type int.

For instance, char + char is int, so:

public class Example {
    public static void main(String[] args) {
        char a = 1;
        char b = 2;

        char c = a + b;          // error: incompatible types: possible lossy conversion from int to char
        System.out.println(c);
    }
}

Re bit-extension, if we follow the link above to widening primitive conversion:

A widening conversion of a char to an integral type T zero-extends the representation of the char value to fill the wider format.

So char 0xFFFF becomes int 0x0000FFFF, not 0xFFFFFFFF.

Community
  • 1
  • 1
T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • 1
    Thanks: I feel the way the numeric range is specified (as unicode escapes) in the documenation doesn't leave it unambigously clear that this is how chars are treated arithmetically. They key question is how the widening happens. Is bit 15 propagated or not? – rghome Feb 28 '19 at 11:04
  • @rghome - Updated the end of the answer. No, it's 0-extended. – T.J. Crowder Feb 28 '19 at 11:11
1

From the specs

For char, from '\u0000' to '\uffff' inclusive, that is, from 0 to 65535

Since it's 16 bits, it means they are unsigned.

Federico klez Culloca
  • 26,308
  • 17
  • 56
  • 95