1

I've measured the time cost of unmarshalling XML to objects using Jaxb2 using a large (1.7mb) XML payload with somewhat long (48 character) tag names. I observed via JProfiler running in sampling mode that string interning work was a solid portion of the time spent.

I did some research and found that Jaxb can be run in a mode where it doesn't intern strings. My theory was that in some cases not interning strings during unmarshalling could improve performance at the expense of using more heap memory due to not having to hash every tag name string during the interning process.

The method that I used to suppress Jaxb's interning behavior was to set on my Fastinfoset "StAXDocumentParser" (which implements XMLStreamReader) the "org.codehaus.stax2.internNames" and "org.codehaus.stax2.internNsUris" properties. It's not 100% clear to me why you must set these to "true" in order to prevent Jaxb from interning strings but that is how it works.

These JUnit-driven tests are what I used to conclude that disabling Jaxb's string interning behavior makes a big performance difference:

https://github.com/gjd6640/fastinfoset-performance-evaluation

So my question is multi-part:

1) Am I misunderstanding something important and shouldn't be trying to disable Jaxb's string interning behavior in the first place?

2) Is there a better way to direct Jaxb not to intern strings? The "StAXManager" class doesn't allow you to set these Woodstox-oriented properties. For this test I ended up extending StAXManager as shown below to hack around the problem. This is a hack that I'd prefer not to use in production. I suspect that the idea here is that when Jaxb is unmarshalling from a Woodstox stream it looks to see if Woodstox is already doing interning and when "yes" Jaxb reacts by disabling that step of the process. I'm cheating by piggybacking on that logic in the Jaxb library so would like a better way to go about this.

package com.sun.xml.fastinfoset.stax;
public class JaxbStringInternSuppressionStaxManager extends StAXManager {
    public JaxbStringInternSuppressionStaxManager() {
        // Add to the allowable list of feature names so that the user may set these "StAXInputFactory" properties
        super.features.put("org.codehaus.stax2.internNames", null);
        super.features.put("org.codehaus.stax2.internNsUris", null);
    }
}

Update:

As usual, "A question well-put is half-answered". I've just noticed while drafting this question that "com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector" checks to see if class "com.sun.xml.internal.fastinfoset.stax.StAXDocumentParser" is assignable from the XMLStreamReader that you're using and if so does not enable string interning. In my case my stream object is a "com.sun.xml.fastinfoset.stax.StAXDocumentParser" so interning doesn't get disabled. Now the question is "why does it do this only for the internal flavor of the Fastinfoset library?" Maybe I'll find the answer by carefully reading this post.

Also, if there's a better forum for this sort of question such as an active developer user group please share that info and I'll see about linking them over to this post so that the right people will see this question.

  • Addendum: I'm using JDK: 64 bit jdk1.8.0_121 which appears to bundle classes from Oracle's "com.sun.xml.bind:jaxb-impl" library version "2.1-b02-fcs". – Gordon Daugherty Dec 05 '17 at 16:35

2 Answers2

1

I wouldn't necessarily trust a profiler or a test without also measuring the real use case with and without interning, so be a bit skeptic. However, there are some issues with intern. In particular it uses a fixed-size pool size, so when the pool is full the would-be constant performance for hash lookups degrades to searching linked lists. See http://java-performance.info/string-intern-in-java-6-7-8/ for a longer discussion.

In short you can try to change the pool size with -XX:StringTableSize=n (where n should ideally be prime) and see what happens.

Use -XX:+PrintStringTableStatistics to see how the pool was used when the program terminates and try different sizes.

EDIT: this was an attempt to answer "is there a better way" (i.e. make intern faster). I'll leave the other question to someone more qualified.

ewramner
  • 5,810
  • 2
  • 17
  • 33
  • This is fascinating stuff. Thanks for the info. I briefly tried increasing StringTableSize first by about 2x and then by about 100x and didn't see any improvement. Based on the stats that I'm seeing it looks like my test is only using 3090 of the JVM's default hashmap size of 60k: StringTable statistics: Number of buckets : 60013 = 480104 bytes, avg 8.000 Number of entries : 3090 = 74160 bytes, avg 24.000 Number of literals : 3090 = 256072 bytes, avg 82.871 Total footprint : = 810336 bytes – Gordon Daugherty Dec 05 '17 at 17:04
0

Solution option 1: Simple approach that swaps the entire app over to a different jaxb implementation

Pull in jaxb-impl to use a version of Jaxb that performs better with this Fastinfoset library:

<!-- Both of these libs must be here in order to get performant behavior out of Jaxb by default.
-->
<dependency>
        <groupId>com.sun.xml.fastinfoset</groupId>
        <artifactId>FastInfoset</artifactId>
        <version>1.2.13</version>
        <scope>compile</scope>
</dependency>
<dependency> <!-- This artifactId also exists under javax.xml.bind but it appears that nobody uses that one... -->
    <groupId>javax.xml</groupId>
    <artifactId>jaxb-impl</artifactId>
    <version>2.1</version>
    <scope>runtime</scope>
</dependency>
<!-- End: Both of these libs... -->

This will have the side-effect of updating the jaxb version used by the rest of your code. In some situations may not be desirable. For example, if you're creating a shared library that needs to be usable in various apps it is rude to go and change this functionality when they pull in your shared component.

Solution option 2: Use the JVM's jaxb implementation and a performance hack to trick it into trusting that the strings are already interned (more complex to implement)

  • Use "maven-shade-plugin" to shade and repackage the Fastinfoset library's classes. The result should be a logic-less maven component. This is optional and is meant to ensure that people using your Fastinfoset codec component won't have classpath collisions due to transitive dependencies pulled in by your codec library.
  • Create a my-fastinfoset-codec library that provides a simple API to encode and decode Fastinfoset payloads (consider using InputStreams and OutputStreams for arguments and XMLStreamReader for the return type of the decoder). Add a dependency on your repackaged Fastinfoset library. Note that if you use Eclipse it doesn't deal well with shaded libraries when m2e's "Workspace Resolution" is enabled so disable that for your codec project.
  • Add to my-fastinfoset-codec a class that extends the repackaged Fastinfoset library's "StAXManager". This class should facilitate setting the properties that tell jaxb that the XMLStreamReader that it was given has already interned the NS and tag name strings. Example is below:
    package myrepackagedfastinfosetclassespackageprefix.shaded.com.sun.xml.fastinfoset.stax;
    import myrepackagedfastinfosetclassespackageprefix.shaded.com.sun.xml.fastinfoset.stax.StAXManager;
    public class JaxbStringInternSuppressionStaxManager extends StAXManager {
        public JaxbStringInternSuppressionStaxManager() {
            // Add to the allowable list of feature names so that the user may set these "StAXInputFactory" properties
            super.features.put("org.codehaus.stax2.internNames", null);
            super.features.put("org.codehaus.stax2.internNsUris", null);
        }

        /**
         * This is an optimization. The FastInfoset libraries already intern strings and the JVM's jaxb implementation by default 
         * unnecessarily repeats that work. This is true at least for the 64 bit version of jdk1.8.0_121.
         * 
         * The way that this workaround works is by piggybacking on a Jaxb optimization for the Woodstox parser. When we set
         * these properties it tells jaxb that Woodstox has already interned the strings which causes it to disable its
         * string interning.
         * 
         * We did explore the cleaner option of pulling in the Maven "javax.xml:jaxb-impl" artifact as a dependency instead of using
         * the JVM's jaxb library. That external jaxb library when used with the FastInfoset library does perform substantially better
         * than the JVM's but isn't 100% as fast as the JVM's with interning disabled. The key reason that we quit exploring that solution
         * is that when you repackage (via maven-shade-plugin) the jaxb libraries they no longer work with our standard jaxb binding
         * maven components due to statements like "if ( instanceof my_repackaging_project.shaded.XMLElement)"
         * used during the data mapping process.
         */
        public JaxbStringInternSuppressionStaxManager enableTrickToStopJaxbFromInterningStrings() {
            super.setProperty("org.codehaus.stax2.internNames", true);
            super.setProperty("org.codehaus.stax2.internNsUris", true);
            return this;
        }
    }

Solution option 3: Enough people who have a JVM support contract with Oracle raise tickets asking for non-internal fastinfoset support of some sort.

I'd expect it to be fairly simple for Oracle to teach the JVM-provided jaxb implementation to determine from the given XMLStreamReader that this Fastinfoset implementation is configured to intern strings.

Solution possibility that didn't pan out: Repackage the two jars from solution 1 above

One can use "maven-shade-plugin" or similar to create new jars with custom-prefixed package names. This did work with these libraries after some fiddling. However, the end-result that I came to was that the repackaged jaxb libraries now wanted the jaxb-RI produced OXM objects to have annotations from the new shaded package name. Mine were built the standard way so my repackaged solution wouldn't map any data to my objects. I'm not willing to dictate that our OXM binding libraries use a repackaged jaxb library nor did I like this approach enough to explore ways to repackage more carefully so as not to change the package used for those annotations.

Solution option that I didn't explore:

Use the JVM's fastinfoset classes that have ".internal." in their package names. Those would likely perform well with the jaxb implementation that comes with the JVM but I refuse to expose "future me" to the support costs that come with using internal apis.