0

I want to extract data from the html page with the help of jsoup and xpath.

This is my java code :-

import javax.xml.xpath.XPath;

import javax.xml.xpath.XPathConstants;

import javax.xml.xpath.XPathExpression;

import javax.xml.xpath.XPathFactory;


import org.jsoup.Jsoup;

import org.jsoup.nodes.Document;

import org.w3c.dom.NodeList;


public class RssFeedRead {


    public static void main(String args[])
    {
        try
        {
         Document doc = Jsoup.connect("http://timesofindia.indiatimes.com/world/china/China-sees-red-in-Abes-WWII-shrine-visit/articleshow/27989418.cms").get();
         String title = doc.title();
         System.out.println(title);

          String exp = "//*[@id='cmtMainBox']/div/div[@class='cmtBox']/div/div[@class='box']/div[@class='cmt']/div/span";

          XPathFactory factory = XPathFactory.newInstance();
          XPath xPath = factory.newXPath();
          XPathExpression expr = xPath.compile(exp);

          NodeList node = (NodeList) expr.evaluate(doc, XPathConstants.NODE);

          for (int i = 0; i < node.getLength(); i++)
          {
              System.out.println(expr.evaluate(node.item(i), XPathConstants.STRING)); 
          }

        }
        catch(Exception e)
        {
            System.out.println(e);
        }

    }

}

This error occurred :-

java.lang.ClassCastException: org.jsoup.nodes.Document cannot be cast to org.w3c.dom.Node

so help me to solve this error

G.S
  • 10,413
  • 7
  • 36
  • 52
user3122429
  • 17
  • 2
  • 6

3 Answers3

2

I am freshman here; after a simple investigation, I think you should mind two points:

1) You should cast Jsoup document to org.w3c.dom.Document. You can refer 17802445, to run the code you should download DOMBuilder.

2) I don't konw much about your page in CMS format, does the parser support this? I test the code in 17802445 with other links, it works. But with your link I get a java.lang.NullPointerException, this says the cast failed. please check it.

Hope you can solve it!

My first answer.

Community
  • 1
  • 1
wangdq
  • 1,874
  • 17
  • 26
1

Please highlight the line where the exception was thrown and don't omit the stack trace.

This is the problematic line:

NodeList node = (NodeList) expr.evaluate(doc, XPathConstants.NODE);

You are mixing two APIs for document parsing and handling, XPath and JSoup. An XPath expression does not know about JSoup documents and can't handle them.

You need to decide which of both APIs you want to use for your specific job.

Hauke Ingmar Schmidt
  • 11,559
  • 1
  • 42
  • 50
0

The error is clear enough: a jsoup Document cannot be casted to a w3c Node.

The line should be NodeList node = (NodeList) expr.evaluate(doc, XPathConstants.NODE);

You'll probably have to convert it to a jsoup Node (if it exists, I'm not familiar with this API).

They have everything you need in their javadoc

Toni Toni Chopper
  • 1,833
  • 2
  • 20
  • 29