16

Title is pretty self-explanatory. In a lot of the JRE javadocs I see the phrases "stream of bytes" and "stream of characters" all over the place.

But aren't they the same thing? Or are they slightly different (e.g. interpreted differently) in Java-land? Thanks in advance.

IAmYourFaja
  • 55,468
  • 181
  • 466
  • 756

4 Answers4

25

In Java, a byte is not the same thing as a char. Therefore a byte stream is different from a character stream. Bytes are intended for arbitrary binary data; characters are specifically for data representing the building blocks of strings.

but if a char is only 1 byte in width

Except that it's not. As per the JLS ยง4.2.1 a char is a number in the range:

from '\u0000' to '\uffff' inclusive, that is, from 0 to 65535

But a byte is a number in the range

from -128 to 127, inclusive

Matt Ball
  • 354,903
  • 100
  • 647
  • 710
  • 1
    Thanks @Matt Ball - I understand they are different as far as types go (`byte`, `char`, etc.), but if a `char` is only 1 byte in width, then what's different about storing an input stream as a byte array vs char array? That was at the root of my question. โ€“ IAmYourFaja Feb 26 '13 at 20:07
  • 4
    Who says a `char` is only 1 byte in width? http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html โ€“ Matt Ball Feb 26 '13 at 20:09
8

Stream of byte is just plain byte, like how you would see it when you open a file in HEX Editor.

Character is different from just plain byte. ASCII encoding uses exactly 1 byte per character, but that is not true for many other encoding. For example, UTF-8 encoding may use from 1 to 4 bytes to encode a single character. Stream of character is designed to abstract away the underlying encoding, and produce char of one type of encoding (in Java, char and String uses UTF-16 encoding).

As a rule of thumb:

  • When you are dealing with text, you must use stream of character to decode the byte into character with the appropriate encoding.

  • When you are dealing with binary data or mixed of binary and text, you must use stream of byte, since it doesn't make sense otherwise. If a sequence of byte represents a String in certain encoding, then you can always pick those bytes out and use String(byte[] bytes, Charset charset) constructor to get back the String.

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
5

They are different. char is a 2-byte datatype in Java: byte is a 1-byte datatype.

Edit: char is also an unsigned type, while byte is not.

user207421
  • 305,947
  • 44
  • 307
  • 483
pamphlet
  • 2,054
  • 1
  • 17
  • 27
0

Generally it is better off to talk about streams in terms of their sizes, rather than what they carry. Stream of bytes is more intuitive than streams of chars, because streams of chars make us have to double check that a char is indeed a byte, not a unicode char, or anything fancy.

A char is a representation, which can be represented by a byte, but a byte is always going to be a byte. All world will burn when bytes will stop being 8 bits.

Dmytro
  • 5,068
  • 4
  • 39
  • 50
  • _"All world will burn when bytes will stop being 8 bits."_ Hardly. http://en.wikipedia.org/wiki/Byte#History โ€“ Matt Ball Feb 26 '13 at 20:23