I am parsing html using java sax parser, its mostly well formatted but not all tags are terminated,
unfortunately I cannot change the source.
Is it possible to tell the java Sax Parser to ignore Fatal Errors ?
Errors like:
SAXParseException;
The entity "nbsp" was referenced, but not declared.
The element type "img" must be terminated by the matching end-tag "</img>".
The element type "meta" must be terminated by the matching end-tag "</meta>".
This is the code I am using:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setIgnoringComments(true);
//factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, false);
// factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xml));
Document doc = builder.parse(is);