I am currently trying to learn how to parse XML and HTML. I was able to parse slickdeals XML feed with my current code, but when I attempt to parse the front page of the slickdeals I encountered an error
[Fatal Error] :102:23: The entity name must immediately follow the '&' in the entity reference. Exception in thread "main" org.xml.sax.SAXParseException: The entity name must immediately follow the '&' in the entity reference. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:246) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
public class SlickDealMainPage {
public void parsing() throws Exception{
String url = "http://slickdeals.net/";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new URL(url).openStream());
doc.getDocumentElement().normalize();
//System.out.println("Root Element : " + doc.getDocumentElement().getNodeName());
System.out.println("Root Element : " + doc.getElementsByTagName("Body"));
NodeList itemList = doc.getElementsByTagName("body");
/* for(int temp = 0; temp < itemList.getLength(); temp++)
{
Node itemNode = itemList.item(temp);
System.out.println("\nCurrent Element : " + itemNode.getNodeName());
Element itemElement = (Element) itemNode;
System.out.println("\ntitle : " + itemElement.getElementsByTagName("title").item(0).getTextContent());
System.out.println("\nLink : " + itemElement.getElementsByTagName("link").item(0).getTextContent());
System.out.println("\nDate Published: " + itemElement.getElementsByTagName("pubDate").item(0).getTextContent());
}*/
}
}
I am new to using the DOM method for parsing and I have searched all over for an answer to this problem. However, I did really understand the other answers very well.
Edit: The error occurs at
Document doc = db.parse(new URL(url).openStream());
Thank You for your help!