7

I'm reading an xml-file which contains german, french, spanish, english and polish text.

To handle the polish letters (which caused the most trouble) i tried to do it like this:

File file = new File(path);
InputStream is = new FileInputStream(file);
Reader reader = new InputStreamReader(is, charset);

InputSource src = new InputSource(reader);
src.setEncoding(charset.name());

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();

saxParser.parse(src, handler);

The problem i encountered was that none of the default charsets display the text properly. Some have questionmarks in it some have a combination of other chars in it e.g. ÄÖ..

To break it a bit down I wrote another snippet to test which charset works:

public static void main(String[] args){
        Charset charset = StandardCharsets.UTF_8;
        String chars = "śłuna długie";
        System.out.println(new String(chars.getBytes(charset), charset));
}

Again tested every single one but nothing works.. I hope you've got an idea.

codewing
  • 674
  • 8
  • 25
  • 3
    In your first code snippet you don't output anything so it is hard to tell what is happening. The second code snippet works, `śłuna długie` is correcty printed to the console. What console do you use? *Where* do you try to print this text? Does the output device support the characters you want to print? –  Jun 14 '15 at 13:31
  • It's kind of difficult to provide the right code of a parser because there are so many important parts of it.. Ok, so the problem could be my console.. haven't thought of that. It's the default console of my Intellij IDEA – codewing Jun 14 '15 at 14:11
  • changed my project and ide encoding to utf 8 and it worked.. thanks buddy – codewing Jun 14 '15 at 14:19

1 Answers1

0

My solution: Change the encoding of your ide

I used the default encoding of my ide (intellij) which was "windows-1252", due to the fact that I'm using windows on this pc.

So I changed it to UTF-8 and the short test code worked fine for me.

codewing
  • 674
  • 8
  • 25