I want to be able to detect the character encoding of a file without knowing about the file's metadata. Is there a way to do this without looping through every row of a file and looking for characters of a specific encoding?
Asked
Active
Viewed 615 times
0
-
3You can refer to this https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream – Qingfei Yuan Jul 09 '19 at 14:46
-
Even Google can't do it for each case, even with looping through every row – Mikhail Ionkin Jul 09 '19 at 20:12
-
Possible duplicate of [Java : How to determine the correct charset encoding of a stream](https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream) – Tom Blodget Jul 11 '19 at 04:19
-
You mean you want to guess, based on the current content of the file, right? – Tom Blodget Jul 11 '19 at 04:19
-
Not guess because it has to be accurate. I don't think what I've asked is possible. – Chris Jul 11 '19 at 13:03
-
That is correct. Communicating text means both bytes and character encoding. Often the character encoding is implied or communication on a side channel like HTTP response header Content-Type. But, it is almost never stored with text files (though it could be in some file systems). That said, a text file with the bytes of a Unicode BOM, have a very high probability of being one of the easily distinguished Unicode encodings. – Tom Blodget Jul 13 '19 at 13:47