Parsing html string to dom document using cyberneko

Asked Dec 10 '12 at 08:28

Active Oct 20 '13 at 22:20

Viewed 1,031 times

I'm trying to parse a html string to a w3c dom document using neko html but my document is always null. This I the code is use:

try {
        String html = readFile("C:/Users/thomas/Desktop/test.html");

        InputStream is = new ByteArrayInputStream(html.getBytes("UTF-8"));

        DOMParser parser = new DOMParser();

        parser.parse(new InputSource(is));
        Document document = parser.getDocument();

        System.out.println(parser.getDocumentSource());
    } catch (Exception e) {
        System.out.println(e.getMessage());
    }

asked Dec 10 '12 at 08:28

thommie

1

If the `document` were null, I cannot see the code line which would throw the Nullpointer, can you elaborate on the precise issue? – Stefan Dec 10 '12 at 08:34
If I break on the last line of the try statement the value of document is null: document = (org.apache.html.dom.HTMLDocumentImpl) [#document: null] – thommie Dec 10 '12 at 09:04
2

`[#document: null]` does not mean the document itself is null. It would print `null` otherwise. – Alex Dec 10 '12 at 09:05
But it does mean the parsing has failed...? – thommie Dec 10 '12 at 09:06
can you call document.getBody()? If you get a nullpointer you have an issue. Look at the toString() method, it is returning nodeName(#document) and node content (null) - http://grepcode.com/file/repository.jboss.org/maven2/xerces/xercesImpl/2.6.2/org/apache/xerces/dom/NodeImpl.java#NodeImpl.toString%28%29. I think you are AOK. – Stefan Dec 10 '12 at 09:08
No toString() returns [#document: null] – thommie Dec 10 '12 at 09:11

Parsing html string to dom document using cyberneko

0 Answers0