0

I am using "FileInputStream" and "FileReader" to read a data from a file which contains unicode characters.

When i am setting the default encoding to "cp-1252" both are reading junk data, when i am setting default encoding to UTF-8 both are reading fine.

  1. Is it true that both these use System Default Encoding to read the data?
  2. Then whats the benifit of using Character stream if it depends on System Encoding.
  3. Is there any way apart from:

     BufferedReader fis = new BufferedReader(new InputStreamReader(new FileInputStream("some unicode file"),"UTF-8"));
    

    to read the data correctly when the default encoding is other than UTF-8.

Ritesh Kaushik
  • 715
  • 2
  • 13
  • 24
  • Why setting encoding by hand is a bad option for you? – Aleksander Gralak Dec 14 '12 at 09:08
  • @ Aleksander Gralak that we can always do, i want to know how to do it programatically, even if default encoding is different. – Ritesh Kaushik Dec 14 '12 at 09:10
  • But you are doing it programatically. It is hardcoded, but it is in source code. If you want to do it in runtime, then get the string from some kind of properties. Sorry i just do not understand what is your problem here. – Aleksander Gralak Dec 14 '12 at 09:12
  • Look at this http://stackoverflow.com/questions/9181530/auto-detect-character-encoding-in-java – Jan Krakora Dec 14 '12 at 09:15
  • `FileInputStream` reads raw octets (bytes.) It has no concept of character encoding. `FileReader` transcodes data from the default encoding to UTF-16 chars. The default encoding is a legacy of the 1990s; Unicode encodings should be preferred; use types/methods that use the default encoding reluctantly. – McDowell Dec 14 '12 at 09:26

1 Answers1

1

FileReader and FileWriter should IMHO be deprecated. Use

new InputStreamReader(new FileInputStream(file), "UTF-8")

or so.

Here also there exists an overloaded version without the encoding parameter, using the default platform encoding: System.getProperty("file.encoding").

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138