59

I was initially surprised that Java decides to specify that byte is signed, with a range from -128..127 (inclusive). I'm under the impression that most 8-bit number representations are unsigned, with a range of 0..255 instead (e.g. IPv4 in dot-decimal notation).

So has James Gosling ever been asked to explain why he decided that byte is signed? Has there been notable discussions/debates about this issue in the past between authoritative programming language designers and/or critics?

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • 17
    Good question. This is a truly useless datatype. – Thorbjørn Ravn Andersen Jun 24 '10 at 08:30
  • 12
    *"I'm under the impression that most 8-bit number representations are unsigned..."* Well, the ones called "byte" usually are, yeah. The ones called "char" tend not to be. Gosling wanted to keep everything signed, which is fair 'nuff, though I really wish he'd gone with a different name for it the 8-bit signed number. (But then, I really wish he'd had unsigned numbers in the flippin' language, as well.) – T.J. Crowder Jun 24 '10 at 08:49
  • 2
    *"I'm under the impression that most 8-bit number representations are unsigned..."* In the C standard, whether a `char` is signed or unsigned is implementation defined - but in most C implementations, `char`s are signed. – Artelius Jun 24 '10 at 08:57
  • 2
    Stop whining about Java not having unsigned bytes, it is not a problem. The only minor inconvenience is that you need casts to byte for hex constant tables. For everything else see http://stackoverflow.com/questions/397867/port-of-random-generator-from-c-to-java/397997#397997 – starblue Jun 24 '10 at 09:10
  • 1
    @T.J, char is a 16 bit, not 8 bit. – Thorbjørn Ravn Andersen Jan 16 '11 at 11:57
  • @Thorbjørn: Quite, good point. – T.J. Crowder Jan 16 '11 at 14:16
  • 14
    @starblue Huh? Obviously you've never had to write any byte-handling code in Java (parsers, encoders)... this is a major PITA. And possibly a performance problem too, depending on how jvm is able to optimize away casts (or not). Btw, problem is not even limited to just byte, but also to short, int, with respect to sign extension. – StaxMan Oct 04 '11 at 21:23
  • I'd care to guess that "char" is signed because it was signed in hardware in the PDP-11. Memory was real tight, and small values in signed (16-bit) int registers could be stored and loaded as 8 bits. "C" was first developed on the PDP-11. I can guess that Java et al inherited this for no good reason. Railroad gauges and Roman roads may be urban legend. But this one is plausible. – Mischa Jan 29 '15 at 19:04
  • 7
    @starblue Try refactoring a Java implementation of a byte-based communications protocol then tell me that unsigned bytes are not a problem! Nightmare more like – Chris Hatton Feb 18 '15 at 11:52

4 Answers4

33

It appears that simplicity was the main reason. From this interview:

Gosling: For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up.

My initial assumption was that it's because Java doesn't have unsigned numeric types at all. Why should byte be an exception? char is a special case because it has to represent UTF-16 code units (thanks to Jon Skeet for the quote)

Joey
  • 344,408
  • 85
  • 689
  • 683
Bozho
  • 588,226
  • 146
  • 1,060
  • 1,140
  • 1
    char isn't even numeric, is it? so char is neither signed, nor unsigned. – unbeli Jun 24 '10 at 08:37
  • 3
    @unbeli: It's a numeric type in the language specification. – Jon Skeet Jun 24 '10 at 08:39
  • @unbell char is an integral type, just like the byte,short, int and long http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#4.2.1 – Pete Kirkham Jun 24 '10 at 08:40
  • 12
    From section 4.2: The numeric types are the integral types and the floating-point types. The integral types are byte, short, int, and long, whose values are 8-bit, 16-bit, 32-bit and 64-bit signed two's-complement integers, respectively, and char, whose values are 16-bit unsigned integers representing UTF-16 code units (§3.1). – Jon Skeet Jun 24 '10 at 08:40
  • 6
    I understand why they might do that for ints but aren't unsigned bytes o to 255 simpler? – Roman A. Taycher Jun 24 '10 at 08:44
  • 5
    @Roman: I've observed many questions on stackoverflow regarding `byte` level manipulation (something Bloch recommends AGAINST perhaps precisely because...); they're really tricky to get right because of sign extension. Fortunately most of these can be hidden away in libraries, but it would be nice if the language elements themselves aren't so tricky to begin with. – polygenelubricants Jun 24 '10 at 09:52
  • 2
    "`char` is a special case". Hm, `byte` isn't? – Dávid Horváth Feb 09 '21 at 16:16
17

As per 'Oak Language Specification 0.2' aka Java language:

"The Oak byte type is what C programmers are used to thinking of as the char type. But in the Oak language, characters are 16 bits wide. Having a separate byte type removes the confusion in C between the interpretation of char as an 8 bit integer and as a character."

You can grab a postscript copy from here :

http://cretesoft.com/archive/files/OakSpec0.2.ps (partial copy on scribd)

Also there is a part of interview posted on this site: (Where he is defending the absence of unsigned byte in java)

http://www.darksleep.com/player/JavaAndUnsignedTypes.html

Adding the interview taken from the above mentioned page...

*" http://www.gotw.ca/publications/c_family_interview.htm

Q: Programmers often talk about the advantages and disadvantages of programming in a "simple language." What does that phrase mean to you, and is [C/C++/Java] a simple language in your view?

Ritchie: [deleted for brevity]

Stroustrup: [deleted for brevity]

Gosling: For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up.

On the other hand.... According to http://www.artima.com/weblogs/viewpost.jsp?thread=7555

Once Upon an Oak ... by Heinz Kabutz July 15, 2003

... Trying to fill my gaps of Java's history, I started digging around on Sun's website, and eventually stumbled across the Oak Language Specification for Oak version 0.2. Oak was the original name of what is now commonly known as Java, and this manual is the oldest manual available for Oak (i.e. Java). ... Unsigned integer values (Section 3.1)

The specification says: "The four integer types of widths of 8, 16, 32 and 64 bits, and are signed unless prefixed by the unsigned modifier.

In the sidebar it says: "unsigned isn't implemented yet; it might never be." How right you were. "*

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
Favonius
  • 13,959
  • 3
  • 55
  • 95
8

I'm not aware of any direct quotes from James Gosling, but there's an official RFE for unsigned byte:

Bug ID: 4186775: request unsigned integer types, esp. unsigned byte

State: 11-Closed, Will Not Fix, request for enhancement

Please extend the Java design to allow unsigned types, particularly unsigned byte.

I have been wondering why there are no unsigned integer types in Java. It seems to me that for byte-length values it is extremely awkward not to have them [...]

I recognize that this was a design decision made by the Java developers. What I don't understand is why. Did they consider unsigned integer types evil or harmful, and chose to protect me from myself?

Aneri
  • 1,342
  • 8
  • 21
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
0

There's no reason for a byte to be unsigned. when you have char type to represent characters, the byte would normally not do that job of a char.

this. __curious_geek
  • 42,787
  • 22
  • 113
  • 137
  • I believe Java Chars are USC-2 and stored in 16 bits/2 bytes. Even if that wasn't the case I have always felt it to be an ugly type wart that c has no native byte type(yes I know a char is a byte but even for c it feels like playing too loose with types). – Roman A. Taycher Jun 24 '10 at 12:25
  • 2
    There are a lot of reasons for byte to be unsigned when you are working with bitwise operations. – j2gl Sep 17 '14 at 03:51
  • 1
    Example: Is this true of false? byte b = (byte) 200; System.out.println( b > 100 ); To fix it, you need to end almost all instructions with & 0xFF byte b = (byte) 200; System.out.println( (b & 0xFF) > 100 ); This link explains why more in detail: http://nayuki.eigenstate.org/page/javas-signed-byte-type-is-a-mistake – j2gl Sep 17 '14 at 04:10
  • 5
    There's far less reason for it to be signed. Try writing code to convert a big-endian sequence of four signed bytes to an "int". Now do likewise with four signed bytes. Which is clearer? Types which are used as building blocks for larger values (as bytes often are--especially when reading and writing files) should be unsigned. Having them signed makes things more complicated – supercat Sep 26 '16 at 22:26