I've measured the time cost of unmarshalling XML to objects using Jaxb2 using a large (1.7mb) XML payload with somewhat long (48 character) tag names. I observed via JProfiler running in sampling mode that string interning work was a solid portion of the time spent.
I did some research and found that Jaxb can be run in a mode where it doesn't intern strings. My theory was that in some cases not interning strings during unmarshalling could improve performance at the expense of using more heap memory due to not having to hash every tag name string during the interning process.
The method that I used to suppress Jaxb's interning behavior was to set on my Fastinfoset "StAXDocumentParser" (which implements XMLStreamReader) the "org.codehaus.stax2.internNames" and "org.codehaus.stax2.internNsUris" properties. It's not 100% clear to me why you must set these to "true" in order to prevent Jaxb from interning strings but that is how it works.
These JUnit-driven tests are what I used to conclude that disabling Jaxb's string interning behavior makes a big performance difference:
https://github.com/gjd6640/fastinfoset-performance-evaluation
So my question is multi-part:
1) Am I misunderstanding something important and shouldn't be trying to disable Jaxb's string interning behavior in the first place?
2) Is there a better way to direct Jaxb not to intern strings? The "StAXManager" class doesn't allow you to set these Woodstox-oriented properties. For this test I ended up extending StAXManager as shown below to hack around the problem. This is a hack that I'd prefer not to use in production. I suspect that the idea here is that when Jaxb is unmarshalling from a Woodstox stream it looks to see if Woodstox is already doing interning and when "yes" Jaxb reacts by disabling that step of the process. I'm cheating by piggybacking on that logic in the Jaxb library so would like a better way to go about this.
package com.sun.xml.fastinfoset.stax;
public class JaxbStringInternSuppressionStaxManager extends StAXManager {
public JaxbStringInternSuppressionStaxManager() {
// Add to the allowable list of feature names so that the user may set these "StAXInputFactory" properties
super.features.put("org.codehaus.stax2.internNames", null);
super.features.put("org.codehaus.stax2.internNsUris", null);
}
}
Update:
As usual, "A question well-put is half-answered". I've just noticed while drafting this question that "com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector" checks to see if class "com.sun.xml.internal.fastinfoset.stax.StAXDocumentParser" is assignable from the XMLStreamReader that you're using and if so does not enable string interning. In my case my stream object is a "com.sun.xml.fastinfoset.stax.StAXDocumentParser" so interning doesn't get disabled. Now the question is "why does it do this only for the internal flavor of the Fastinfoset library?" Maybe I'll find the answer by carefully reading this post.
Also, if there's a better forum for this sort of question such as an active developer user group please share that info and I'll see about linking them over to this post so that the right people will see this question.