1 byte should be enough to store a character than why java uses 2 byte,and another confusing thing while using FileInputStream which do all the operation byte wise how it can read character
-
21 byte is enaugh for storing ASCII characters. Java uses Unicode, not ASCII. – BackSlash Jan 19 '14 at 11:11
-
One byte doesn't fly because sadly not everyone communicates in the same language. Some [background reading](http://www.joelonsoftware.com/articles/Unicode.html) for you. – avik Jan 19 '14 at 11:22
2 Answers
The Java char datatype is 16 bit, byte is 8 bit.
This is because Java Strings are unicode Strings, not ASCII ones allowing standard Java Strings to be used in most languages worldwide.

- 40,716
- 16
- 83
- 128
FileInputStream (as well as other classes inherited from InputStream) is indeed "byte-oriented", it is not suitable for reading character data.
If you need to read text file, you should probably try this:
new InputStreamReader(new FileInputStream(file), "UTF8")
you'll need to know file encoding beforehand, of course.
If you just need to read the file into string and the file is not exceptionally big, the easiest way would be to call FileUtils.readFileToString. See Apache Commons IO javadoc for more information.
Update-201301191245: To those who naively think that they can read bytes from file into byte array, then convert byte array to string: this will not work for UTF-8, for it might contain multibyte characters. Consider the following:
- Dollar sign ("$", U+0024) occupies only one byte in UTF-8: 24
- Euro sign ("€", U+20AC) occupies three bytes in UTF-8: E2 82 AC
Imagine the situation:
Suppose you read "E2 82" into the end of the fixed-size buffer, and "AC" is left to be read on the next read cycle. When you try to convert bytes "E2 B2" to java character, the result will be damaged text data.

- 2,552
- 18
- 23
-
**All** streams are "byte-oriented" ... bytes are bytes. The only difference between the classes are the convenience methods they expose to coerce those bytes into something else (and or provide buffering). You can easily read an array of bytes from `FileInputStream` and convert them to a `String` without introducing a giant 3rd party dependency. – Brian Roach Jan 19 '14 at 11:30
-
I agree, that all streams are "byte-oriented", will correct my answer. I disagree with this: "read an array of bytes from FileInputStream and convert them to a String". This would not work for the case of composite (multibyte) characters. Consider euro sign U+20AC: it is encoded with 3 bytes in UTF-8: E2 82 AC. – akhikhl Jan 19 '14 at 11:44
-
You really don't seem to understand how data makes it way between computers / files, or how strings are actually represented/encoded in Java. Hint: it's bytes. `new String(byteArray, Charset.forName("UTF-8"));` – Brian Roach Jan 19 '14 at 11:48
-
Please, read my update above. The fact that you never faced the problem with multibyte characters does not mean the problem does not exist. – akhikhl Jan 19 '14 at 11:52
-
@BrianRoach, how would you approach multibyte character reading problem. – akhikhl Jan 19 '14 at 11:54
-
The same way the convenience methods supplied by the various higher level streams and stream wrappers do. *Of course* you can't encode a partial read (You edited to specify that). The 3rd party lib and method you cite simply reads *all the bytes* before performing the encoding, which is the most naive approach (which honestly is what I was commenting on - the `String` constructor will happily do this for you). If you're trying to read lines at a time you need to know what the line separator is (say, 0x10 || 0x13 || 0x10 0x13) which is how, for example, `BufferedReader.readLine()` works. – Brian Roach Jan 19 '14 at 12:07
-
My point being; It's good to know how things *actually* work, then use abstractions/helpers where appropriate. Unless you understand the underlying mechanics, the 'appropriate" part is very difficult to judge. – Brian Roach Jan 19 '14 at 12:13