3

I tried to migrate some Apache digester based XML serialization code to JAXB, with very poor results. Is there some advice for a best practice with this design pattern (or for giving up on JAXB with this one...)?

The XML relies heavily on interfaces and reflective declarations.

<someContainer>

  <someChild class="foo.Class1">
  </someChild>

</someContainer>

I resolved the indirection using an adapter with this unmarshalling code

public ResultType unmarshal(Object object) throws Exception {
    Element element = (Element) object;
    String classname = element.getAttribute("class");
    Class clazz = Class.forName(classname);
    JAXBContext jc = getContext(clazz);
    Unmarshaller unmarshaller = jc.createUnmarshaller();
    Object result = unmarshaller.unmarshal(element, clazz);
    if (result instanceof JAXBElement) {
        return (ResultType) ((JAXBElement) result).getValue();
    }
    return (ResultType) result;
}

Well, even with JAXBContext caching this is by far superior to the digester based code (no exact meassure to this point, lets say about 50 to 100 times). The main cause for the actual performance loss seems to be in the "element" parameter to the unmarshal. Most of the time is spent creating a new DocumentBuilder environment needed to re-transform the already parsed Element

Any advice?

skaffman
  • 398,947
  • 96
  • 818
  • 769
mtraut
  • 4,720
  • 3
  • 24
  • 33
  • 1
    What does "by far out of scope" mean? And which failure are you referring to? – skaffman Mar 18 '12 at 16:53
  • This means that the performance is very bad. While the orignal digester code has a 2 to 4 second delay on large documents, the JAXB code as above is more in the minute area. What is a "failure" (from the point of view that my code is not usable in this context) is the treatment of the element parameter, that will be treated as a DOM source again and processd by a new DocumentBuilder. This seems from a frist glance the point where most time is spent. – mtraut Mar 19 '12 at 08:22

2 Answers2

4

I do a lot of things like this and cant complaint of poor JAXB performance, which is of top priority in my case. Exactly the same task takes no more than 3-4ms for 5kb payload on a pretty slow server. As for caching, I usually create the following bidirectional data structure for marshaling/unmarshaling, adopted for your case:

  Map<String, MarshallData> marshalCache;

  public ResultType unmarshal(Object object) throws Exception {
    Element element = (Element) object;
    MarshallData md = marshalCache.get(element.getAttribute("class"));
    Object result = md.unmarshaller.unmarshal(element);
    if (result instanceof JAXBElement) {
        return (ResultType) ((JAXBElement) result).getValue();
    }
    return (ResultType) result;
  }

  public void registerMarshallData(Class clazz) throws Exception {
    JAXBContext jbc =  JAXBContext.newInstance(clazz); // or get it somewhere else if needed
    MarshallData mdata = new MarshallData(jbc.createMarshaller(), jbc.createUnmarshaller());
    marshalCache.put(clazz.getName(), mdata);
  }  

  class MarshallData {
    private Unmarshaller unmarshaller;
    private Marshaller marshaller;

    protected MarshallData(Marshaller marshaller, Unmarshaller unmarshaller) {
      this.marshaller = marshaller;
      this.unmarshaller = unmarshaller;
    }
  }
andbi
  • 4,426
  • 5
  • 45
  • 70
  • Do you have a reference *not* using JAXB? At least with the JDK 1.6 built in library "unmarshal(element)" will result in building a new DocumentBuilder that in turn seems to be the reason for an estimated factor 50 performance loss. – mtraut Mar 19 '12 at 08:30
  • @mtraut, I dont have such reference. When I had similar issues I've undertaken small investigation and removed all time-consuming operations from the course of handling payloads. For example you don't actually need to create mentioned `DocumentBuilder` per request, one can use several static prebuilt document builders. Additionally, if you're using large docments, you might consider using SAX or try using non-default DOM implementations. – andbi Mar 19 '12 at 08:48
  • I gave your snippet a try and added the unmarshaller itself to the cache - with no success. When you say use SAX or non-default DOM: Do you mean abandoning JAXB (what is exactly the question, is JAXB appropriate for such a use case) – mtraut Mar 19 '12 at 08:56
  • @mtraut, no, I mean that JAXB is able to use different sources and targets in m/um: io streams, dom, sax sources, xml streams, etc (see API for details) One of them may fit your needs better without abandoning JAXB. For instance it's a common knowledge that large doc are better to handle by `org.xml.sax.*` rather than DOM. Btw, have you checked that you don't create `DocumentBuilder` per request? It's pretty expensive. – andbi Mar 19 '12 at 09:16
  • See the accepted answer. I do not directly create a 'DocumentBuilder' per request. This is done internaly to handle the `Element` in JAXB - so your code snippet does the same. You're right insofar as the `DocumentBuildeFactory` lookup was the reason for the degrade. Adding the startup parameter fixes this. I will award the bounty when it is unlocked. Btw. do what do you think is the "canonical" code for hanlding this scenario? – mtraut Mar 19 '12 at 09:22
3

Well, i add this for reference to other people wondering if they need to abandon JAXB.

Adding

-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl

did the job. The performance now has the same order of magnitude. It seems we have a plain classpath issue. On the large classpath, the service provider lookup needed for the (uncached) DocumentBuilderFactory was the key factor to the bad performance. Give the VM a known implementation to avoid searching....

I tried this after "stochastical" profiling (press suspend whenever you like :-) always ended up in the search sequence like this:

Thread [main] (Suspended)   
    owns: VirtualKeyStoreHolder  (id=417)   
    owns: CertificateStoreEnvironment  (id=418) 
    owns: ManagedCertificateProvider  (id=419)  
    owns: CertificateStoreEnvironment$1  (id=420)   
    WinNTFileSystem.getBooleanAttributes(File) line: not available [native method]  
    File.exists() line: 733 
    URLClassPath$FileLoader.getResource(String, boolean) line: 999  
    URLClassPath$FileLoader.findResource(String, boolean) line: 966 
    URLClassPath.findResource(String, boolean) line: 146    
    URLClassLoader$2.run() line: 385    
    AccessController.doPrivileged(PrivilegedAction<T>, AccessControlContext) line: not available [native method]    
    Launcher$AppClassLoader(URLClassLoader).findResource(String) line: 382  
    Launcher$AppClassLoader(ClassLoader).getResource(String) line: 1003 
    Launcher$AppClassLoader(ClassLoader).getResourceAsStream(String) line: 1193 
    SecuritySupport$4.run() line: 96    
    AccessController.doPrivileged(PrivilegedAction<T>) line: not available [native method]  
    SecuritySupport.getResourceAsStream(ClassLoader, String) line: 89   
    FactoryFinder.findJarServiceProvider(String) line: 250  
    FactoryFinder.find(String, String) line: 223    
    DocumentBuilderFactory.newInstance() line: 123  
    TransformerIdentityImpl.createResultContentHandler(Result) line: 215    
    TransformerIdentityImpl.setDocumentLocator(Locator) line: 881   
    DomLoader$State.<init>(DomLoader, UnmarshallingContext) line: 78    
    DomLoader<ResultT>.startElement(UnmarshallingContext$State, TagName) line: 113  
    XsiTypeLoader.startElement(UnmarshallingContext$State, TagName) line: 76    
    UnmarshallingContext._startElement(TagName) line: 481   
    UnmarshallingContext.startElement(TagName) line: 459    
    SAXConnector.startElement(String, String, String, Attributes) line: 148 
    SAXParserImpl$JAXPSAXParser(AbstractSAXParser).startElement(QName, XMLAttributes, Augmentations) line: 501  
    XMLNSDocumentScannerImpl.scanStartElement() line: 400   
    XMLNSDocumentScannerImpl$NSContentDriver(XMLDocumentFragmentScannerImpl$FragmentContentDriver).next() line: 2755    
    XMLNSDocumentScannerImpl(XMLDocumentScannerImpl).next() line: 648   
    XMLNSDocumentScannerImpl.next() line: 140   
    XMLNSDocumentScannerImpl(XMLDocumentFragmentScannerImpl).scanDocument(boolean) line: 511    
    XIncludeAwareParserConfiguration(XML11Configuration).parse(boolean) line: 808   
    XIncludeAwareParserConfiguration(XML11Configuration).parse(XMLInputSource) line: 737    
    SAXParserImpl$JAXPSAXParser(XMLParser).parse(XMLInputSource) line: 119  
    SAXParserImpl$JAXPSAXParser(AbstractSAXParser).parse(InputSource) line: 1205    
    SAXParserImpl$JAXPSAXParser.parse(InputSource) line: 522    
    UnmarshallerImpl.unmarshal0(XMLReader, InputSource, JaxBeanInfo) line: 211  
    UnmarshallerImpl.unmarshal(XMLReader, InputSource) line: 184    
    UnmarshallerImpl(AbstractUnmarshallerImpl).unmarshal(InputSource) line: 137 
    UnmarshallerImpl(AbstractUnmarshallerImpl).unmarshal(InputStream) line: 184 
    VirtualKeyStoreTools.createVirtualKeyStore(InputStream) line: 54    
... more to come

As there's a pending bounty, i'll offer this to @Osw (when the bounty can be applied) as the answer "Yes, you can JAXB" is correct. I think minor performance differences do no justify using a plan SAX solution or staying any longer with the digester. In addition, caching the JAXBContext seems wise, i can not tell about caching the unmarshaller context - maybe someone adds information about this.

Thank's for your support.

EDIT

As requested by @Osw here are some (disappointing) figures for the resulting overall performance.

I did some dirty instrumentation for the old and the new application directly around the parsing. While the figures are "acceptable" for an interactive application loading a bunch of files, i must admit that the digester is still more than 2* faster.

  • Digester based, ~1 MB file containing 4000 generic entries = 400ms
  • JAXB based, see above, 900ms

The JAXB implementation already contains a cache for the unmarshaller, so the "out of the box" optimizations are done. This leaves me with an functional application that urgently needs some more profiling. I will come back when this is done and some interesting tricks of general interest will pop up.

mtraut
  • 4,720
  • 3
  • 24
  • 33
  • thanks for investigation, it's pretty surprisingly, adding to favorites. Could you please also post profiling data _after_ the fix? – andbi Mar 19 '12 at 09:36
  • 2x gap seems to me more explainable: SAX vs DOM plus single-purpose solution vs java-wide technology. Assuming that your code has no bottlenecks, I still believe the gap can be narrowed to some extent by using `SAX` or `StAX` xml-streaming or testing different xml parsers. The first will require serious code rewriting but chances are better. It's up to you to decide whether it's worth it. – andbi Mar 21 '12 at 06:10
  • What i'm searching/missing is an (official) hook that allows to "inject" the Java class in the unmarshalling process upon start of the generic element instead of creating a complete element subtree that needs to be parsed again. That should result in better performance and IMHO cleaner code. Some geek done that? – mtraut Mar 21 '12 at 13:40
  • here are some interesting answers showing how to catch xml stream events prior to passing them to the unmarshaller: http://stackoverflow.com/questions/277502/jaxb-how-to-ignore-namespace-during-unmarshalling-xml-document – andbi Mar 21 '12 at 14:24