Question on the Sax XML parser on Android, using Java: I need to parse XML files I get from the web, and that I have no control over. Some contain errors and cause the parser to abort with errors like "mismatched tag" or "not well-formed (invalid token)".
Those errors don't matter to me, I want to ignore them and keep going, I can handle the broken XML structure. But I cannot fix the XML files, they are not mine. How can I tell Sax on Android (class org.xml.sax.XMLReader) to not throw an exception and keep going? Attaching an ErrorHandler didn't work, and catching the exception is of no use because I can't resume parsing where it stopped.
My XML is not HTML, but here are some (X)HTML examples where browsers ignore errors and keep going. I want to do this too.
- Browsers are fine with "<br>" instead of "<br/>" even though the tag is never closed.
- "<b><i> text </b></i>" works even though the closing tags are in the wrong order.
- "odds & ends" is accepted despite the invalid token, "odds & ends" would be correct.
I'd prefer to not write my own parser, dealing with character set conversions and all that. I don't need to validate XML. Here's my code, reduced to the essentials:
XMLReader r = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
r.setErrorHandler(new MyLenientErrorHandlerThatNeverThrows());
r.setContentHandler(new MyImporterThatExtendsDefaultHandler());
r.parse(new InputSource(new BufferedReader(...)));
Thanks!