If we have a character in our text file which is in unicode, mustn't it be 2 bytes of data?
But the read()
method reads one byte at a time as an int
. So if we have a FileInputStream
object fin
and we invoke int x = fin.read()
once, how do we get the full character back upon System.out.println(x)
if only one byte has been read? (the fin.read()
is not in a while
loop or anything, it is just called once)

- 3,199
- 4
- 34
- 57

- 329
- 1
- 2
- 8
-
1It reads one byte at a time, because it is specified to read one byte at a time. If you need to deal with Strings nicely, you will have to use a decorator. – Coderino Javarino Jul 13 '16 at 06:05
-
Not an exact duplicate, but the linked question's accepted answer explains the difference between reading bytes and characters. – Cameron Skinner Jul 13 '16 at 06:05
2 Answers
Good question! You're right that in Java characters are always two bytes, but that isn't true elsewhere (e.g. in the contents of a file).
A file is not encoded "in "Unicode" because Unicode is a specification, not an encoding. Encodings map the Unicode specification to certain byte sequences, and not all such encodings use two-byte characters. Java char
s are UTF-16 which is always two bytes wide, but many files are stored as UTF-8 which is variable-width; ASCII chars are one byte, others are two or more.
More to the point however, InputStream
is designed to read binary data, not characters, and binary data is (essentially) always read one byte at a time. If you want to read text you wrap your stream in a Reader
(preferably explicitly specifying the encoding to be used) to convert the binary data into text. Internally it will call read()
one or more times in order to properly construct a character from the sequence of bytes based on the encoding.

- 47,227
- 18
- 148
- 244
Streams are for reading bytes, not characters. If you want to read characters, use a Reader
. The Reader will read characters one at a time, and will deal with characters decoding from bytes: depending on the character encoding (and the character itself), a character can be encoded in a single byte, two or even more.

- 678,734
- 91
- 1,224
- 1,255