2

I'm trying to parse some xml which is invalid as the attributes are not in quotes, is there any way of getting around this? A simple example of this below, as well as the java code.

XML

<car id=1>
.
.
</car>

Java

  SAXParserFactory factory = SAXParserFactory.newInstance();
  factory.setValidating(false);
  SAXParser saxParser = factory.newSAXParser();
  saxParser.parse(page, handler);  //page is an input stream where the xml is.

Thanks.

JCS
  • 897
  • 4
  • 20
  • 43

1 Answers1

8

What you have is well-formedness issue and not a validation issue (the code you posted is only disabling the validation). XML Parsers require the xml to be wellformed and are mostly written to forgive only validation issues. May be if you look at html parsers like JSoup you have a better chance as they are forgiving about the well-formedness as well as they try to auto correct them.

Read this article to understand the difference between well-formedness and Validity.

Aravind Yarram
  • 78,777
  • 46
  • 231
  • 327
  • Thanks for that,I have used jsoup before how would i go about auto correcting the xml using jsoup? – JCS Jan 22 '13 at 17:33
  • I am not saying that jsoup will auto-correct. i am suggestiong you to look around for auto-correcting html parsers (like http://ccil.org/~cowan/XML/tagsoup/) which you can use to auto-correct the bad xmls you have. – Aravind Yarram Jan 22 '13 at 17:39
  • @ Pangea Just tried to parse the xml in jsoup(using Jsoup.parse(string)) and it did correct the xml, thanks. – JCS Jan 22 '13 at 17:55