1

I'm trying to parse a XML using Apache Commons JXPath. But for some reason, its not able to identify the child nodes after the xml is being parsed. Here's the sample code :

private static void processUrl(String seed){
    String test = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><feed xmlns=\"http://www.w3.org/2005/Atom\" xmlns:media=\"http://search.yahoo.com/mrss/\" xmlns:openSearch=\"http://a9.com/-/spec/opensearchrss/1.0/\" xmlns:gd=\"http://schemas.google.com/g/2005\" xmlns:yt=\"http://gdata.youtube.com/schemas/2007\"><id>http://gdata.youtube.com/feeds/api/videos</id><logo>http://www.youtube.com/img/pic_youtubelogo_123x63.gif</logo><link rel=\"alternate\" type=\"text/html\" href=\"http://www.youtube.com\"/><author><name>YouTube</name><uri>http://www.youtube.com/</uri></author><generator version=\"2.1\" uri=\"http://gdata.youtube.com\">YouTube data API</generator><openSearch:totalResults>144</openSearch:totalResults><entry><id>http://gdata.youtube.com/feeds/api/videos/P1lDDu9L5YQ</id><published>2010-09-20T17:41:38.000Z</published><updated>2011-09-18T22:15:38.000Z</updated><category scheme=\"http://schemas.google.com/g/2005#kind\" term=\"http://gdata.youtube.com/schemas/2007#video\"/><link rel=\"alternate\" type=\"text/html\" href=\"http://www.youtube.com/watch?v=P1lDDu9L5YQ&amp;feature=youtube_gdata\"/></entry></feed>";
    Document doc = null;
    try{
        DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        ByteArrayInputStream bais = new ByteArrayInputStream(test.toString().getBytes("UTF8"));
        doc = builder.parse(bais);
        bais.close();

        JXPathContext ctx = JXPathContext.newContext(doc);
        List entryNodes = ctx.selectNodes("/feed/entry");
        System.out.println("number of threadNodes " + entryNodes.size());
        int totalThreads = 0;
        for (Object each : entryNodes) {
            totalThreads++;
            Node eachEntryNode = (Node) each;
            JXPathContext msgCtx = JXPathContext.newContext(eachEntryNode);
            String title = (String) msgCtx.getValue("title");
        }
    }catch (Exception ex) {
        ex.printStackTrace();
    }
}

I've used JXPath earlier and never had any issues. I debugged the document object,it doesn't seemed to have the child node () for . All I'm able to see is the root element. I also tried DOMParser without any luck.

DOMParser parser = new DOMParser();
        Document doc = (Document) parser.parseXML(new ByteArrayInputStream(sb0.toString().getBytes("UTF-8")));

I'll appreciate if someone can provide pointers to this isuse.

Shamik
  • 1,671
  • 11
  • 36
  • 64
  • One thing I found is if I remove the attributes from and simply make it , then the JXPath is able to resolve the nodelist. Unfortunately,this is a feed which I can't change, any reason or workaround to handle this? – Shamik Sep 21 '11 at 07:57

1 Answers1

4

This issue has to do with how JXPath handles default namespaces, which closely follows the XPath 1.0 specification. This also explains why it worked after you removed the default namespace http://www.w3.org/2005/Atom. In order to get it to work with the default namespace you can do the following:

JXPathContext ctx = JXPathContext.newContext(doc.getDocumentElement());
// Register the default namespace, giving it a prefix of your choice
ctx.registerNamespace("myfeed", "http://www.w3.org/2005/Atom");

// Now query for entry elements using the registered prefix
List entryNodes = ctx.selectNodes("myfeed:entry");

For more information on the issue see the following links.

http://markmail.org/message/7iqw4bjrkwerbh46

Make jxpath namespace aware

Garett
  • 16,632
  • 5
  • 55
  • 63
  • Thanks a ton, worked like a charm.Thanks for the pointers to the docs, got a better understanding on the namespace issue. – Shamik Sep 23 '11 at 06:37