9

What is the fastest and most efficient way to create XML documents in Java? There is a plethora of libraries out there (woodstox, xom, xstream...), just wondering if any one has any input. Should I go with code generation approach (since xml schema is well known)? Or reflection approach at run-time?

Edited with Additional information:

  1. Well defined XML Schema is available and rarely changes
  2. Requirement is to convert a java object to XML, and not vice versa
  3. Thousands of java objects to XML per second
  4. Code generation, code complexity, configuration, maintenance etc. is second to higher performance.
Andrei Petrenko
  • 3,922
  • 3
  • 31
  • 53
arrehman
  • 1,322
  • 6
  • 31
  • 42
  • 2
    How do you define efficient? Least memory usage? Least lines of code to use? Fastest at doing what? Marshaling? Unmarshaling large documents? Pretty printing? What are you doing that requires the "fastest" library? Be sure to choose a library based on what's really important and not a criteria that doesn't matter. You may find it is better to choose a library that is "fast enough" because it is easier to use than a library that is the "fastest", yet is a complete headache to use and maintain. – Paul Jan 28 '12 at 17:44
  • 1
    "Most efficient" phrase is not clear. Most efficient for memory or cpu help to clarification. – Erdinç Taşkın Jan 28 '12 at 17:45
  • 1
    possible duplicate of [Which xml serialization library is performance oriented?](http://stackoverflow.com/questions/5918665/which-xml-serialization-library-is-performance-oriented) – skaffman Jan 28 '12 at 17:49
  • Paul: My requirement is to be really fast, taking as little time as possible. Amount of coding or configuration, maintanence does not matter. XML Schema is available as well. Efficient meaning, reasonable memory usage, works flawlessly, no memory leaks since it processes thousands of records pe second, well known in the industry etc. Hope this clarifies. – arrehman Jan 28 '12 at 17:53

6 Answers6

15

If I was to create a very simple XML content, I would stick to the JDK api only, introducing no third party dependencies.

So for simple XML and if I was to map XML file to Java classes (or vice-versa), I would go for JAXB. See this tutorial to see how easy it is.

Now.

If I was to create some more sophisticated XML output with constant scheme, I would use some templating engine, Freemarker perhaps. Thymeleaf looks nice as well.

And finally.

If I was to create huge XML files very effectively, I would use SAX parser.

I hope you understand now, that you have plenty of possibilities - choose the best match for your needs :)

And have fun!

ŁukaszBachman
  • 33,595
  • 11
  • 64
  • 74
  • 6
    I wouldn't consider `StringBuilder` to be the simple way to create valid xml. I think that's the most complex way, because you have to do everything yourself. Look how easy it is to create xml from objects using EclipseLink MOXy, for example: [MOXy: Getting Started](http://wiki.eclipse.org/EclipseLink/Examples/MOXy/GettingStarted/TheBasics) That's way easier than using `StringBuilder`. – Paul Jan 28 '12 at 17:53
  • @Paul, good point. By saying simplest I meant - introducing no third party dependencies and avoiding SAX. I will edit response. – ŁukaszBachman Jan 28 '12 at 17:57
  • 1
    Thanks Lukas, as mentioned int the previous comment I want faster performance. The XML is not huge. I don't need mapping of XML files to java objects. So this helps to narrow it down. I like the idea of templating engines, which I will look into. I will post my experience here. – arrehman Jan 28 '12 at 17:57
  • 1
    The example cited by @Paul would work with any JAXB implementation, and one is included in the JDK/JRE starting with Java SE 6. – bdoughan Jan 28 '12 at 18:19
  • You guys are right. I've skipped the fact that JAXB is introduced to JDK. I have modified my response. – ŁukaszBachman Feb 01 '12 at 14:07
  • +1 for including the templating alternative, this is often ignored – Christophe Roussy Oct 01 '14 at 12:28
  • You mention using the SAX Parser to write huge XML files. How do you want to do that? The SAX parser does not have any output methods. Fine for reading large files but not for writing. You might be thinking of using the SAX framework as in http://stackoverflow.com/questions/4898590/generating-xml-using-sax-and-java but then you should correct your answer to address this point. – Anthill Jan 30 '16 at 18:03
6

Try Xembly, a small open source library that makes this XML creating process very easy and intuitive:

String xml = new Xembler(
  new Directives()
    .add("root")
    .add("order")
    .attr("id", "553")
    .set("$140.00")
).xml();

Xembly is a wrapper around native Java DOM, and is a very lightweight library (I'm a developer).

yegor256
  • 102,010
  • 123
  • 446
  • 597
  • 1
    I found this post helpful too: http://www.yegor256.com/2014/04/09/xembly-intro.html – Bernie Noel Jan 01 '15 at 13:01
  • 1
    Your library is looks useful but is it efficient? Its nice to get rid of verbosity when appropriate but the question is about the fastest en most efficient way to write XML. Looking at the example above I would say that its not a good solution to write large XML files efficiently. Even though I like your library for other use cases I feel like suggesting this as he most fast and efficient one is not accurate and you are promoting your product instead of providing the best answer. But correct me if I am wrong. –  May 31 '16 at 00:26
2

The nicest way I know is using an XPath engine that is able to create Nodes. XMLBeam is able to do this (in a JUnit test here):

    public interface Projection {

    @XBWrite("/create/some/xml/structure[@even='with Predicates']")
    void demo(String value);
}

@Test
public void demo() {
    Projection projection = new XBProjector(Flags.TO_STRING_RENDERS_XML).projectEmptyDocument(Projection.class);
    projection.demo("Some value");
    System.out.println(projection);
 }

This program prints out:

<create>
   <some>
      <xml>
        <structure even="with Predicates">Some value</structure>
      </xml>
   </some>
</create>
Cfx
  • 2,272
  • 2
  • 15
  • 21
2

Use XMLStreamWriter.

I ran a microbenchmark serializing one million of these:

@XmlRootElement(name = "Root")
public class Root {
    @XmlAttribute
    public String attr;
    @XmlElement(name = "F1")
    public String f1;
    @XmlElement(name = "F2")
    public String f2;
}

with these results:

JAXB: 3464 millis (<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Root attr="at999999"><F1>x999999</F1><F2>y999999</F2></Root>)
XMLStreamWriter: 1604 millis (<?xml version="1.0" ?><Root attr="at999999"><F1>x999999</F1><F2>y999999</F2></Root>)
Xembly: 25832 millis (<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Root attr="at999999">
<F1>x999999</F1>
<F2>y999999</F2>
</Root>
)
StringBuilder: 60 millis (<?xml version="1.0" encoding="UTF-8"><Root attr=")at999999"><F1>x999999</F1><F2>y999999</F2></Root>)
StringBuilder w/escaping: 3806 millis (<?xml version="1.0" encoding="UTF-8"><Root attr="at999999"><F1>x999999</F1><F2>y999999</F2></Root>)

which gives:

  • StringBuilder: 60 ms
  • XMLStreamWriter: 1604 ms
  • JAXB: 3464 ms
  • StringBuilder w/very primitive escaping: 3806 ms
  • Xembly: 25832 ms
  • And a lot of others I didn't try

StringBuilder is the most efficient, but that's because it doesn't need to go through all the text searching for ", &, <, and > and converting them into XML entities.

Petr Hudeček
  • 1,623
  • 19
  • 30
  • 1
    Hi Petr, most relevant answer, tbh. but which XMLStreamWriter was it? I see that JDK 8 only contains it as an interface that has some "com.sun." implementations. and also Apache Commons seem to have one. Which one did you use for this benchmark? thanks – 62mkv Feb 09 '21 at 09:38
2

Firstly, it's important that the serialization is correct. Hand-written serializers usually aren't. For example, they have a tendency to forget that the string "]]>" can't appear in a text node.

It's not too difficult to write your own serializer that is both correct and fast, if you're a capable Java programmer, but since some very capable Java programmers have been here before I think you're unlikely to beat them by a sufficient margin to make it worth the effort of writing your own code.

Except perhaps that most general-purpose libraries might be slowed down a little by offering serialization options - like indenting, or encoding, or like choosing your line endings. You might just squeeze an extra ounce of performance by avoiding unwanted features.

Also, some general-purpose libraries might check the well-formedness of what you throw at them, for example checking that namespace prefixes are declared (or declaring them if not). You might make it faster if it does no checking. On the other hand, you might create a library that is fast, but a pig to work with. Putting performance above all other objectives is almost invariably a mistake.

As for the performance of available libraries, measure them, and tell us what you find out.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
1

Inspired by answer by Petr, I spent better part of the day implementing such a benchmark, reading lots on JMH in the process. The project is here: https://github.com/62mkv/xml-serialization-benchmark

and the results were as follows:

Benchmark                                          (N)   Mode  Cnt    Score    Error  Units
XmlSerializationBenchmark.testWithJaxb              50  thrpt    5  216,758 ± 99,951  ops/s
XmlSerializationBenchmark.testWithXStream           50  thrpt    5   40,177 ±  1,768  ops/s
XmlSerializationBenchmark.testWithXmlStreamWriter   50  thrpt    5  520,360 ± 14,745  ops/s

I did not include Xembly, because by it's description it looked like an overkill for this particular case.

I was a bit surprised that XStream had such a poor track record, given it comes from ThoughtWorks, but might be just because I did not customize it good enough for this particular case. And the default, Java 8 standard library StAX implementation for XMLStreamWriter is hands down the best in terms of performance. But in terms of developer experience, XStream is the simplest one to use, while XMLStreamWriter also requires way more error-prone effort to fully implement; while JAXB is on a well-deserved second place in both nominations.

PS: Feedback and suggestions to improve the suite are very much welcome!

62mkv
  • 1,444
  • 1
  • 16
  • 28