0

I have a txt file. I'm not sure of the encoding of this file. Probably it's s EBCDIC. I have problem with Umlaute (äöü, ÜÄÖ) e.g. For example: Displayed: Mnchen should be: München Url to test file: http://wyslijto.pl/plik/yiewa11y3p

java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;

public class Main {

    public static void main(String[] args) throws IOException {
        BufferedReader in = new BufferedReader(
                new InputStreamReader(
                        new FileInputStream("/Downloads/test.txt")));

//        BufferedReader in = new BufferedReader(
//                new InputStreamReader(
//                        new FileInputStream("/Downloads/test.txt"), Charset.forName("windows-1252")));
        String str;
        while ((str = in.readLine()) != null) {
            System.out.println(str);
        }
    }
}
vs97
  • 5,765
  • 3
  • 28
  • 41
mmc.dev
  • 33
  • 1
  • 3
  • Try to use a proper encoding and set the buffereader with UTF8 – curiouscupcake Mar 13 '19 at 21:51
  • 1
    Basically, you need to find out the encoding. I doubt that it's EBCDIC, or it would be considerably more broken than it is now. I suggest you find out more about what produced the file, and use that information to try to find out the encoding. – Jon Skeet Mar 13 '19 at 22:06
  • `file` says: "Downloads/test.txt: UTF-8 Unicode (with BOM) text, with no line terminators", so @vs97 's answer should work. – Robert Mar 13 '19 at 22:07

1 Answers1

1

Unfortunately there is no certain way to detect encoding without knowing what was used to create the file in the first place. I will refer you to this question, it has a lot of suggestions how one could make an intelligent guess what the encoding really is.

Once you know the encoding (that's the difficult part), its simple. For example, if the encoding was to be UTF-8, use UTF-8 charset with your InputStreamReader:

BufferedReader in = new BufferedReader(
                new InputStreamReader(
                        new FileInputStream("/Downloads/test.txt"), StandardCharsets.UTF_8));

In general, the supported Charsets are:

  • ISO_8859_1
  • US_ASCII
  • UTF_16
  • UTF_16BE
  • UTF_16LE
  • UTF_8
vs97
  • 5,765
  • 3
  • 28
  • 41