1

I have a file that has chinese characters . I have written a java code that reads this file and write's to another file using FileInputStream/FileOutputStream (Byte Stream) and it's working fine . But problem is coming when i am using Character stream FileReader/FileWriter .

Now the question is that how come a byte stream that uses just eight bits able to read a chinese characters because what i know is that byte stream reads/writes one byte at a time and using one byte we can only recognise ASCII characters (i.e only 128 characters) . The character stream (FileReader/FileWriter) that uses 16 bits to read/write and has ability to read/write a chinese character is not able to read/write the file properly .

Holger
  • 285,553
  • 42
  • 434
  • 765
  • you should show us your code, *but* notice that `FileInputStream` does not care what your file stores, it copies raw bytes; when you open the file that it has copied to the tool that you are using (let's say notepad) is applying a Encoding that transforms those bytes to human readable text. – Eugene Feb 09 '17 at 05:56
  • @Eugene Ok fine . Now suppose that a each chinese character is stored as two bytes in the file and byte stream reader/writer reads and writes one byte at a time to the another file and finally the tool(notepad++) transforms it to the human readable form(i.e it merge two bytes to form a chinese character) . But what's the problem in character stream , it is also reading and writing two bytes at a time to file and the tool (notepad++) should transform it into human readable form . – Utlesh Singh Feb 09 '17 at 06:47
  • 1
    not necessarily two bytes, it could be more; surrogate pairs for example. Also almost for sure you are getting the encoding wrong, see here for a hint: http://stackoverflow.com/questions/13350676/how-to-read-write-this-in-utf-8 – Eugene Feb 09 '17 at 09:18

1 Answers1

1

Character encoding (or decoding) only applies when you are trying to convert byte streams to string (or character). FileInputStream and FileOutputStream will work with any characters as they are not character but bytes to it.

When you try and read a file as character using FileReader and FileWriter, you have to take character encoding in consideration. Look at the following java docs from FileReader,

Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream.

Now if you are using FileReader (similar for FileWriter), it will pick the default encoding from the system it is running (locale based), especially in Windows OS. You could,

  1. Specifically pass file.encoding parameter as "UTF-8"
  2. Construct your own InputStream with proper encoding.

Hope this helps

GauravJ
  • 2,162
  • 1
  • 22
  • 28
  • 1
    To emphasize, if you are doing text processing do not use FileInputStream (unless you have some need to avoid Java's text processing classes). – Tom Blodget Feb 09 '17 at 17:33