3

I was recently wondering about a good library for XML manipulation in Java: A nice Java XML DOM utility

Before re-inventing the wheel, porting jQuery to Java in jOOX, I checked out these libraries:

But at closer inspection, I can see:

  • jsoup does not operate on a standard org.w3c.dom document structure. They rolled their own implementation. I checked out the code and I doubt that it is as efficient and tuned as Xerces, for instance. For my use-cases, performance is important
  • jsoup seems tightly coupled with HTML. I only want to operate on XML, no HTML structure, no CSS
  • gwtquery is coupled with GWT. I'm not sure how tightly

Has anyone made any experience with these libraries when using it only for server-side XML, not for HTML?

I'm interested in

  • Performance benchmarks (maybe comparing it with standard DOM / XPath)
  • Compatibility experience (easy to import/export to standard DOM?)
Community
  • 1
  • 1
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509

1 Answers1

1

Without an answer after one month, I think that my own library will resolve my problems best:

http://www.jooq.org/products/jOOX

Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
  • What made you go that route? jSoup's been pretty clutch for me so far. – Kyle Clegg Jun 02 '12 at 09:39
  • @Kyle: jsoup (as in *jsoup: Java HTML Parser*) doesn't support standard DOM (as in `org.w3c.dom`). It's entirely focused on HTML... As far as my question was concerned, jOOX seemed a better match for my needs – Lukas Eder Jun 02 '12 at 09:52
  • Gotcha. You may be right. After some more work with jSoup today it's definitely geared towards parsing HTML (not what I need for this project). I was however able to do everything I need though, this documentation page being most helpful: http://jsoup.org/cookbook/extracting-data/dom-navigation. – Kyle Clegg Jun 03 '12 at 05:14
  • @Kyle: Yes yes, you can do *some* DOM manipulation, of course. But as soon as you'd like to combine things with SAX, JAXB, XPath, transformation, XSLT, and all other standard technologies, you'll get to jsoup's limits quite quickly... – Lukas Eder Jun 03 '12 at 10:12
  • @Lukas, what workflow would you suggest to transform real-world HTML into XML, so that jOOX can be used? – Dr. Max Völkel Aug 12 '16 at 09:53
  • @xamde: You can probably export jsoup content using `Element.html()` and parse that into jOOX, or you could write an `org.w3c.dom` implementation that binds to jsoup... You could roll your own or participate in this feature request that I've just created: https://github.com/jhy/jsoup/issues/745 – Lukas Eder Aug 12 '16 at 12:08
  • 2
    @xamde: Huh in fact, this API exists: https://jsoup.org/apidocs/org/jsoup/helper/W3CDom.html#fromJsoup-org.jsoup.nodes.Document- – Lukas Eder Aug 12 '16 at 16:10