0

I am working on files with unknown encoding at first but I get the encoding with this lines in JAVA:

InputStream in = new FileInputStream(new File("D:\\lbl2\\1 (26).LBL"));
    InputStreamReader inputStreamReader = new InputStreamReader(in);
    System.out.print(inputStreamReader.getEncoding());

and we get UTF8 in output. but the problem is that when I try to see file content with the browser or text editor like Notpad++ I can't see character correctly. Instead when I change the encoding to Windows-1256 all of characters view correct and readable. Do i do any mistake?

mhbashari
  • 482
  • 3
  • 16

2 Answers2

0

Java does not attempt to detect the encoding of a file. getEncoding returns the encoding that was selected in the InputStreamReader constructor. If you don't use one of the constructors that take a character set parameter, you get the 'platform default charset', according to Oracle's documentation.

This question discusses what the platform default charset is, and how you can change it.

If you know in advance that this file is Windows-1256, you can use:

InputStreamReader inputStreamReader = new InputStreamReader(in, "Windows-1256");

Attempting to detect the encoding of a file usually fails - see for example the Bush hid the facts issue in Windows Notepad.

Community
  • 1
  • 1
Mike Dimmick
  • 9,662
  • 2
  • 23
  • 48
0

Unfortunately there is no 100% reliable way to detect the encoding of a file and as the other answer points out Java by default doesn't try. It simply assumes the platform's default encoding.

If you know the files are all in a single encoding then great, you can just specify that encoding and life is good.

If you know that some files are in UTF-8 and some files are in a single legacy encoding then you can generally get away with trying a strict* UTF-8 decode first. If the strict UTF-8 decode errors out then you move on to your legacy encoding.

If you have a wider mix of encodings things get considerablly harder, you may have to resort to some quite complex language processing to sort them out.

* I belive to get a strict decode in Java you need to first get the "Charset", then get a "CharsetDecoder" and then use the "onMalformedInput" method to set it to strict mode.

plugwash
  • 9,724
  • 2
  • 38
  • 51