0

I'm trying to parse a html string to a w3c dom document using neko html but my document is always null. This I the code is use:

try {
        String html = readFile("C:/Users/thomas/Desktop/test.html");

        InputStream is = new ByteArrayInputStream(html.getBytes("UTF-8"));

        DOMParser parser = new DOMParser();

        parser.parse(new InputSource(is));
        Document document = parser.getDocument();

        System.out.println(parser.getDocumentSource());
    } catch (Exception e) {
        System.out.println(e.getMessage());
    }
thommie
  • 438
  • 5
  • 22
  • 1
    If the `document` were null, I cannot see the code line which would throw the Nullpointer, can you elaborate on the precise issue? – Stefan Dec 10 '12 at 08:34
  • If I break on the last line of the try statement the value of document is null: document = (org.apache.html.dom.HTMLDocumentImpl) [#document: null] – thommie Dec 10 '12 at 09:04
  • 2
    `[#document: null]` does not mean the document itself is null. It would print `null` otherwise. – Alex Dec 10 '12 at 09:05
  • But it does mean the parsing has failed...? – thommie Dec 10 '12 at 09:06
  • can you call document.getBody()? If you get a nullpointer you have an issue. Look at the toString() method, it is returning nodeName(#document) and node content (null) - http://grepcode.com/file/repository.jboss.org/maven2/xerces/xercesImpl/2.6.2/org/apache/xerces/dom/NodeImpl.java#NodeImpl.toString%28%29. I think you are AOK. – Stefan Dec 10 '12 at 09:08
  • No toString() returns [#document: null] – thommie Dec 10 '12 at 09:11

0 Answers0