0

I am trying to find the encoding of a file using the java program. But it always providing the UTF-8 as the output. Even though it is an ANSI file.

import java.io.InputStream 
import java.io.FileInputStream 
import java.io.BufferedInputStream 
import java.io.InputStreamReader 
new InputStreamReader(new FileInputStream("FILE_NAME")).getEncoding

The library is old and looks no proper support for that. https://code.google.com/archive/p/juniversalchardet/

Some are so many answers, that say we can find the encoding of the file like Java : How to determine the correct charset encoding of a stream

These solutions doesnt look good. According to @ Jörg W Mittag We cannot find the encoding of a file for sure.

loneStar
  • 3,780
  • 23
  • 40
  • It is impossible to find the encoding of a file if you don't already know it. For example, a file that contains the octet sequence `0xA4 0x0D 0x0A` could either be in ISO8859-1 containing the international currency sign followed by a Windows line break or in ISO8859-15 containing the Euro sign followed by a Windows line break. A file containing the octet sequence `0x48 0x65 0x6C 0x6C 0x6F` could be the text `Hello` in ASCII, UTF-8, UTF-7, ISO8859-1, ISO8859-15, Windows-1252, and many others. – Jörg W Mittag Sep 17 '19 at 17:18
  • I dont know that you can **find** the encoding for anything. Encoding is not part of the data, rather how one treats data. You probably want something that tries out different encodings and provide a litmus test for true or false – YisraelU Sep 17 '19 at 17:21
  • @JörgWMittag, Then why they are libraries that claim, they can provide the encoding of a file. Like the link provided in the question. – loneStar Sep 17 '19 at 17:26
  • 1
    Lots of people claim lots of things. That doesn't mean it is true. I gave you two examples for which it is impossible to find the encoding. – Jörg W Mittag Sep 17 '19 at 17:27
  • Ask the person who wrote or gave you the file. Otherwise, it's a matter of probabilities. Text files are for experts who want to keep track of these things. Other experts and non-experts don't bother with arbitrary text files. – Tom Blodget Sep 17 '19 at 22:36

1 Answers1

0

In scala I don't have sure, but have you tried alread this lib?

public static Charset guessCharset2(File file) throws IOException {
    return CharsetToolkit.guessEncoding(file, 4096, StandardCharsets.UTF_8);
  }