2

I have this line "ĆćĘ꣏źł" in file.csv, which is encoded (as Notepad++ shows) as ANSI. How can I correctly show this line in console like CcEeLzzl.

For removing accents I'm using StringUtils.stripAccents(myLine) from apache but still got "��Ee����"

        FileReader fr = null;
        try {
            String sCurrentLine;
            br = new BufferedReader(new FileReader(fileName2));
            while ((sCurrentLine = StringUtils.stripAccents(br.readLine())) != null) {
                System.out.println(StringUtils.stripAccents(sCurrentLine));
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                if (br != null)
                    br.close();
                if (fr != null)
                    fr.close();
            } catch (IOException ex) {
                ex.printStackTrace();
            }
        }```

I want in COnsole this "CcEeLzzl", not that "ĆćĘ꣏źł". Please help me.
  • Welcome to SO! Maybe you will find your answere here: https://stackoverflow.com/questions/18141162/how-to-convert-ansi-to-utf8-in-java – Jonathan Oct 17 '19 at 20:40
  • Thank You, @Jonathan !!!) I tried this one. It did convert my file to UTF-8, but I got this line like that ÆæEe£�Ÿ³. – bamboo__bear Oct 17 '19 at 20:53
  • I suspect the string is correct and no conversion is needed. Look up curiosa's answer, you may have to convert the letters manually. – Jonathan Oct 17 '19 at 20:55
  • The first thing you should try is `br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName2), "windows-1250"));`. By “console” do you mean a Microsoft Windows command window? – VGR Oct 17 '19 at 21:15
  • @VGR, no sorry, by "console" I mean to show line with System.out.println(lineFromANSIFile) – bamboo__bear Oct 17 '19 at 21:16
  • And where are you viewing the results of your program’s calling System.out.println? – VGR Oct 17 '19 at 21:18
  • @VGR Thank You so much!!! changed this line " br = new BufferedReader(new FileReader(fileName2));" to Yours "br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName2), "windows-1250"));" and It perfectly works. – bamboo__bear Oct 17 '19 at 21:23

1 Answers1

1

Looks like you want to apply a custom mapping from polish letters to ascii which is outside the domain of stripAccents. Probably you have to define it by yourself, e.g. like done below (only shown for "Ł" and "ł").

Spoiler: no, you don't have to. The ansi on windows encoding was the culprit. With proper decoding StringUtils.stripAccents worked fine. See comments. But if you ever leave stripAccents's domain...

public void Ll() {
    Map<String, String> map = new HashMap<>();
    map.put("Ł", "L");
    map.put("ł", "l");

    System.out.println(Arrays.stream("ŁałaŁała".split("(?!^)"))
            .map(c -> {
                String letter = map.get(c);
                return letter == null ? c : letter;
            })
            .collect(Collectors.joining("")));
}
Curiosa Globunznik
  • 3,129
  • 1
  • 16
  • 24
  • thank You for answer, but when I put my line from file instead of this string "ŁałaŁała" it doesn't work. It shows "�a�a�a�a". StringUtils also works fine when I try to change "ŁałaŁała" to "LalaLala", but not from ANSI file :( – bamboo__bear Oct 17 '19 at 21:13
  • Try to read the file this way (taken from [here](https://stackoverflow.com/questions/18556104/read-and-write-text-in-ansi-format)) `reader = new BufferedReader(new InputStreamReader(new FileInputStream(), "Cp1252"));` Java is utf-8 world, so we have to get it there. That's supposed to read Windows ANSI. And proceed with StringUtils, I suppose :-/ Isn't it a bit late for hardcore coding as of GMT-1? – Curiosa Globunznik Oct 17 '19 at 21:15