XPath produces garbled output instead of Unicode characters

Question

I am parsing this XML file:

<?xml version="1.0" encoding="UTF-8"?>

<tests>
    <test category="Русский"/>
    <test category="ελληνικά"/>
    <test category="中文"/>
    <test category="English"/>
</tests>

Main class is:

import java.io.File;
import java.io.FileInputStream;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

public class TestUnicode {
    public static void main(String[] args) throws Exception {
        XPath xpath = XPathFactory.newInstance().newXPath();
        XPathExpression lolwhy = xpath.compile("//test");
        final InputSource inputSource =
                new InputSource(
                new FileInputStream(
                new File("sample.xml")));
        NodeList parent = (NodeList) lolwhy.evaluate(
                inputSource,
                XPathConstants.NODESET);
        System.out.println(parent.getLength());
        for (int i = 0; i < parent.getLength(); i++) {
            System.out.println(parent.item(i).getAttributes().
                    getNamedItem("category").getNodeValue());
        }
    }
}

And the output is:

4
???????
????????
??
English

What am I doing wrong here?

EDIT: ok, this issue was related to hebrew appears as question marks in netbeans and the solution is this: Setting the default Java character encoding?

Your Java console doesn't understand the encoding of the text sent to it. Try writing your output to a text file and reading it. — Hovercraft Full Of Eels, Jun 05 '11 at 13:50

score 0 · Accepted Answer · answered Jun 05 '11 at 13:49

0

Could be that the parsing is ok, but the output is wrong.

If you you used a font that doesn't contain those characters, or if you output the values to HTML, but specify a wrong encoding, this can be the result.

The font-issue being the more likely one.

answered Jun 05 '11 at 13:49

GolezTrol

114,394
18
182
210

Yes, seems to be some kind of console output problem in Netbeans, but funny thing is - it seem to print garbage no matter what font i use. – zbstof Jun 05 '11 at 14:59

score 0 · Answer 2 · answered Jun 05 '11 at 13:52

0

System.out.println is the culprit. See if this helps

http://hints.macworld.com/article.php?story=20050208053951714

answered Jun 05 '11 at 13:52

DmitryK

5,542
1
22
32

Thanks, but PrintStream workaround only prints: 4 Ð ÑƒÑ�Ñ�ÐºÐ¸Ð¹ ÎµÎ»Î»Î·Î½Î¹ÎºÎ¬ ä¸æ–‡ English – zbstof Jun 05 '11 at 14:57

XPath produces garbled output instead of Unicode characters

2 Answers2