0

If I had a file encoded in ISO but wanted to read the file as UTF-8 using java would I still get the same text?

would special characters such as µÃÿ display the same?

Paul
  • 1,375
  • 4
  • 16
  • 29

2 Answers2

1

No, you would not. UTF-8 does not encode characters beyond U+007f in the same way as ISO-8859-1 (ISO-8859-1 encodes U+0080 through U+00ff as single bytes \x80 to \xff, while UTF-8 uses two bytes for each of those characters).

You have to use an explicit encoding specification when opening the file: new InputStreamReader(new FileInputStream(...), <encoding>)

nneonneo
  • 171,345
  • 36
  • 312
  • 383
  • On the Internet, just saying "ISO encoding" suggests ISO Latin 1 encoding, since that's the older (more popular) encoding. OP should clarify, to be sure. – nneonneo Sep 19 '12 at 23:09
  • Can you back that up? Why and where is it the most popular? – nullpotent Sep 19 '12 at 23:11
  • Google search. Googling 'ISO encoding' results in no mention of 10646 on the first page at all. Same with "ISO encoding". – nneonneo Sep 19 '12 at 23:16
0

In short, no. The way the characters are represented (bitwise) in ISO is not the same as how characters are represented in UTF-8.

However, you can convert a file from ISO to UTF-8, but not UTF-8 to ISO, because there are many more recognizable characters in UTF-8 than there are in ISO.

My recommendation would be to detect the encoding (see: Java : How to determine the correct charset encoding of a stream) and then to handle each case accordingly.

Community
  • 1
  • 1
alvonellos
  • 1,009
  • 1
  • 9
  • 27
  • If you can at all avoid using a character detection library, though, then you should. Character detection isn't 100%, and can lead to various weird issues when it gets the answer wrong. – nneonneo Sep 19 '12 at 23:18