How to get file encoding type?

Question

Java : How to determine the correct charset encoding of a stream

i want to get the file encoding type runtime for pertiqular file.

System.getProperties("file.encoding");

the above code display the same encoding type for all input file.

@RakeshPatel if you question is excel specific you should state so and also include some source code — oers, Mar 14 '12 at 12:01
there is no any code.i am use use universalchardet API example and test for excel. — Rakesh Patel, Mar 14 '12 at 12:03

score 2 · Accepted Answer · answered Mar 14 '12 at 11:36

2

See Marcelos comment - there are some libraries you can use to guess the encoding of a file, but you can never determine it for sure, unless you know before-hand. There is no "standard" information in arbitrary text-files to indicate which encoding has been used to write it. Specific file formats may include encoding information, but that would be in some proprietary way, specific to that file format.

answered Mar 14 '12 at 11:36

pap

27,064
6
41
46

ok i will increase but the suggested library is not get the encoding type for japanese excel file.i have tried it is not give aby encoding type for excel file – Rakesh Patel Mar 14 '12 at 11:40

Stefan · Answer 2 · 2017-02-08T13:01:03.703

0

System.getProperty("file.encoding") returns your os default encoding. You cannot read out the encoding from a text file, but you can set the encoding explicitly when writing files, to make sure the right encoding is set.

edited Feb 08 '17 at 13:01

answered Mar 14 '12 at 11:26

Stefan

12,108
5
47
66

i want to get each any every file encoding type runtime. – Rakesh Patel Mar 14 '12 at 11:35
You can only "guess" the encoding at runtime for it was set when the file was written. See here: [link](http://stackoverflow.com/questions/1288899/java-text-file-encoding) – Stefan Mar 14 '12 at 11:38
there is a typo in getProperties, it should be getProperty – Federico Gaule Palombarani Feb 07 '17 at 12:48

yggdraa · Answer 3 · 2012-03-14T12:51:31.940

"file.encoding" property is the default encoding wich will be applied when your text will be saved to file.

There is no standard way to recognize text encoding if the text does not contain some encoding info (like xml files do)

My way of detecting plain text encoding is as follows:

Russian text may come in following encodings: cp1251, dos866, unicode, koi-8 For each russian letter there are combination with others letters that never can be seen in text. E.g. after letter 'а' you'll never see any of "ъ, ы, ь".

For every letter i have such set of "impossible letters after". Then i load the file content in every encoding (may load not full text, but some resonable chunk of bytes) and for the text i count how many impossible combinations i've got. The winner is encoding in wich this number is the least. And, of couse, i count chars that come out of the alphabet diapazone, as errors too. Text can contain mistakes, so thare may be errorCount>0 for the right encoding, but for reasonable chunk of text it works quite accurate - the right encoding counts always the least errorCount.

May be you will find this useful somehow.

How to get file encoding type?

3 Answers3