5

Question on the Sax XML parser on Android, using Java: I need to parse XML files I get from the web, and that I have no control over. Some contain errors and cause the parser to abort with errors like "mismatched tag" or "not well-formed (invalid token)".

Those errors don't matter to me, I want to ignore them and keep going, I can handle the broken XML structure. But I cannot fix the XML files, they are not mine. How can I tell Sax on Android (class org.xml.sax.XMLReader) to not throw an exception and keep going? Attaching an ErrorHandler didn't work, and catching the exception is of no use because I can't resume parsing where it stopped.

My XML is not HTML, but here are some (X)HTML examples where browsers ignore errors and keep going. I want to do this too.

  • Browsers are fine with "<br>" instead of "<br/>" even though the tag is never closed.
  • "<b><i> text </b></i>" works even though the closing tags are in the wrong order.
  • "odds & ends" is accepted despite the invalid token, "odds &amp; ends" would be correct.

I'd prefer to not write my own parser, dealing with character set conversions and all that. I don't need to validate XML. Here's my code, reduced to the essentials:

XMLReader r = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
r.setErrorHandler(new MyLenientErrorHandlerThatNeverThrows());
r.setContentHandler(new MyImporterThatExtendsDefaultHandler());
r.parse(new InputSource(new BufferedReader(...)));

Thanks!

1 Answers1

2

Ok, it appears it can't be done. Sax supports error detection but not error recovery, which makes it less than ideal for robust code in this example. Got it to work by replaxing Sax with XmlPullParser, which allows wrapping the next-token call in a try-catch block:

try {
    XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
    XmlPullParser xpp = factory.newPullParser();
    xpp.setInput(in);
    int type = xpp.getEventType();
    while (type != XmlPullParser.END_DOCUMENT) {
        switch (type) {
          case XmlPullParser.START_TAG: startTag(xpp);             break;
          case XmlPullParser.END_TAG:   endTag(xpp);               break;
          case XmlPullParser.TEXT:      characters(xpp.getText()); break;
        }
        try {type = xpp.next();}
        catch (XmlPullParserException e) {}
    }
} catch (Exception e) {}