3

I need to parse an xml chunk which I get without root element, namespace delaration and entity declaration despite including all of the three.
So far I've been using Dom4j and doing some wrapping around the content but new entites and namespaces keep to appear and the DTD/Schema of the content is not accessible.

Given that I don't control the source from which I'm getting XML, Is there any kind of java XML parser that will tolerate these errors?

  1. Abscence of root element

  2. Unbound namespaces

  3. Undeclared entities

Vijay
  • 8,131
  • 11
  • 43
  • 69
Chedy2149
  • 2,821
  • 4
  • 33
  • 56
  • 2
    The proper solution to your problem would be to contact the source and ask them to comply to standards. Whatever they are giving you sure ain't XML. If TagSoup fails you, you could write your own parser. Check [ANTLR4](http://www.antlr.org/wiki/display/ANTLR4/Home) parser generator. It's [reference book](http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference) has XML parsing examples. – predi Aug 23 '13 at 10:31
  • XML does not require a namespace declaration. – Raedwald Aug 23 '13 at 12:07
  • But the xml data that I manipulate uses namespaces – Chedy2149 Aug 23 '13 at 14:24

2 Answers2

2

You can try using TagSoup which is "forgiving" many errors in the markup.

To work around absence of the root element you can always add your own root element around the XML chunk that you need to parse.

Andrey Adamovich
  • 20,285
  • 14
  • 94
  • 132
  • Stills the namespace binding and the entities declaration problem. – Chedy2149 Aug 23 '13 at 10:10
  • 1
    TagSoup will simply suppress all the namespaces. It also supports 2000+ entities already. – Andrey Adamovich Aug 23 '13 at 10:12
  • TagSoup seems interesting but how to use it? Any tutorials? Moreover does it have querying capability(xPath)? – Chedy2149 Aug 23 '13 at 10:37
  • 1
    there is also another library called jsoup which provides quering capabilities. Also this questions gives some more links: http://stackoverflow.com/questions/3152138/what-are-the-pros-and-cons-of-the-leading-java-html-parsers/3154281#3154281 – Andrey Adamovich Aug 23 '13 at 16:09
  • I cannot find TagSoup on ccil.org now, but i found its fork on github: https://github.com/orbeon/tagsoup – Abhishek Oza Mar 12 '18 at 09:56
0

I think all major Java XML parsers have these strict requirements such as a root element. The simple way around all this is to write your own Java XML parser. If you are using the XML purely as a config file then i suggest you look into using Java Properties.

Thanks, Reece