54

The current code base that I am looking at uses the DOM parser. The following code fragment is duplicated in 5 methods :

 DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
 DocumentBuilder builder = factory.newDocumentBuilder();

If a method that contains the above code is called in a loop or the method is called multiple times in the application, we are bearing the overhead of creating a new DocumentBuilderFactory instance and a new DocumentBuilder instance for each call to such a method.

Would it be a good idea to create a singleton wrapper around the DocumentBuilder factory and DocumentBuilder instances as shown below :

public final class DOMParser {
   private DocumentBuilderFactory = new DocumentBuilderFactory();
   private DocumentBuilder builder;

   private static DOMParser instance = new DOMParser();

   private DOMParser() {
      builder = factory.newDocumentBuilder();
   }

   public Document parse(InputSource xml) {
       return builder.parser(xml);
   }
}

Are there any problems that can arise if the above singleton is shared across multiple threads? If not, will there be any performance gain by using the above approach of creating the DocumentBuilderFactory and the DocumentBuilder instances only once throughout the lifetime of the application?

Edit :

The only time we can face a problem is if DocumentBuilder saves some state information while parsing an XML file which can affect the parsing of the next XML file.

Chetan Kinger
  • 15,069
  • 6
  • 45
  • 82
  • 1
    I suggest you only optimise your code when you know you have a problem i.e. you have used a cpu or memory profiler. The rest of the time you should make your code as simple and easy to understand as possible and this will often perform well. – Peter Lawrey Sep 17 '12 at 08:34
  • possible duplicate: http://stackoverflow.com/questions/56737/is-documentbuilder-parse-thread-safe – Denis Tulskiy Sep 17 '12 at 08:35
  • 1
    possible duplicate of [Is DocumentBuilderFactory thread-safe in Java 5+?](http://stackoverflow.com/questions/9828254/is-documentbuilderfactory-thread-safe-in-java-5) – JB Nizet Sep 17 '12 at 08:39
  • another possible duplicate: http://stackoverflow.com/questions/3439485/java-and-xml-jaxp-what-about-caching-and-thread-safety – Denis Tulskiy Sep 17 '12 at 08:48
  • 1
    @PeterLawrey I understand what your getting at but once glance at the code and I can clearly see that the DocumentBuilderFactory and DocumentBuilder are getting created multiple times in loops from methods that uses them! – Chetan Kinger Sep 17 '12 at 10:03
  • @bot so they are not singletons currently. Do you know that creating them multiple times makes a significant difference to your performance? – Peter Lawrey Sep 17 '12 at 10:07
  • That's right. They are not Singletons right now. I am planning to make them a part of my DOMParser Singleton as shown in my question. I am not sure whether creating a new DocumentBuilderFactory and a new DocumentBuilder is expensive or not. Regardless of this fact,don't you think we will still be reducing the amount of time spent on creating these objects for every iteration of a loop or for every method call in which they are used? We will also be reducing the number of objects created! – Chetan Kinger Sep 17 '12 at 10:21

5 Answers5

46

See the comments section for other questions about the same matter. Short answer for your question: no, it's not ok to put these classes in a singleton. Neither DocumentBuilderFactory nor DocumentBuilder are guaranteed to be thread safe. If you have several threads parsing XML, make sure each thread has its own version of DoumentBuilder. You only need one of them per thread since you can reuse a DocumentBuilder after you reset it.

EDIT A small snippet to show that using same DocumentBuilder is bad. With java 1.6_u32 and 1.7_u05 this code fails with org.xml.sax.SAXException: FWK005 parse may not be called while parsing. Uncomment synchronization on builder, and it works fine:

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        final DocumentBuilder builder = factory.newDocumentBuilder();

        ExecutorService exec = Executors.newFixedThreadPool(10);
        for (int i = 0; i < 10; i++) {
            exec.submit(new Runnable() {
                public void run() {
                    try {
//                        synchronized (builder) {
                            InputSource is = new InputSource(new StringReader("<?xml version=\"1.0\" encoding=\"UTF-8\" ?><俄语>данные</俄语>"));
                            builder.parse(is);
                            builder.reset();
//                        }
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }
            });
        }
        exec.shutdown();

So here's your answer - do not call DocumentBuilder.parse() from multiple threads. Yes, this behavior might be JRE specific, if you're using IBM java or JRockit or give it a different DocumentBuilderImpl, it might work fine, but for default xerces implementation - it does not.

30thh
  • 10,861
  • 6
  • 32
  • 42
Denis Tulskiy
  • 19,012
  • 6
  • 50
  • 68
  • I understand that DocumentBuilderFactory and DocumentBuilder are not Thread safe. But if you take a closer look at my implementation of the DOMParser Singleton class, you will notice that there are no methods in this class to modify the DocumentBuilderFactory and DocumentBuilder instances. If N threads are going to call the only public method named parse from DOMParser which simply parses the input XML, I don't see why this would be a problem! – Chetan Kinger Sep 17 '12 at 09:59
  • 2
    @bot: yeah, but parse might not be thread safe either. The fact that there's a `reset()` method on `DocumentBuilder` might suggest that parsing changes something and it is not a good idea to call it from multiple threads. – Denis Tulskiy Sep 17 '12 at 10:12
  • We are making an assumption that parsing an XML file using the DocumentBuilder changes some state in DocumentBuilder. If this assumption is correct, then yes, we will fall into trouble while parsing the next XML file. But this is an assumption we are making. Do you or does anyone know for sure whether DocumentBuilder relies on some state while parsing an XML? – Chetan Kinger Sep 17 '12 at 10:18
  • Now that's what I am talking about. If only I had access to a computer to try this out myself! Thanks a tonne! Does this mean that the parse method is thread safe? – Chetan Kinger Sep 17 '12 at 11:05
  • One more question, does this mean that there won't be any problem while sharing a single instance of DocumentBuilderFactory across multiple threads? – Chetan Kinger Sep 17 '12 at 11:26
  • 4
    @bot: it means that parse method is *not* thread safe. there shouldn't be a problem with DocumentBuilderFactory, but since you want some singleton that parses your files, I'd suggest you to create a pool of DocumentBuilders and use them from it. See http://commons.apache.org/pool/ for a simple implemetation. – Denis Tulskiy Sep 17 '12 at 13:36
  • 1
    FYI - I hit this one even with making my own `Builder` on-the-fly before parsing so the static state must be somewhere deeper. – OldCurmudgeon Jul 03 '14 at 10:33
  • synchronized (builder) {}does not solved my problem, so I decided to create new DocumentBuilder builder = factory.newDocumentBuilder(); for every parsing. It works fine and it's not much slower. – andrej Dec 13 '18 at 17:08
  • I'd like to point out that some Xerces versions, including those built into JDK/JRE, [contain a bug](https://bugs.openjdk.java.net/browse/JDK-8047329), which may cause the `FWK005` exception, even if each thread calls `DocumentBuilderFactory.newInstance()` respectively. That bug report is for `XPathExpression.evaluate(InputSource)`, but internally the same thing happens, even if you instantiate all required factories in each thread. – predi Nov 12 '21 at 09:12
21

The JAXP Specification (V 1.4) says:

It is expected that the newSAXParser method of a SAXParserFactory implementation, the newDocumentBuilder method of a DocumentBuilderFactory and the newTransformer method of a TransformerFactory will be thread safe without side effects. This means that an application programmer should expect to be able to create transformer instances in multiple threads at once from a shared factory without side effects or problems.

https://jaxp.java.net/docs/spec/html/#plugabililty-thread-safety

So, for example, you should be able to create a single DocumentBuilderFactory instance via DocumentBuilderFactory.newInstance and then use that single factory to create a DocumentBuilder per thread via DocumentBuilderFactory.newDocumentBuilder. You could also create a pool of DocumentBuilders.

I can't find anywhere that says that, for example, the static method DocumentBuilderFactory.newInstance is thread-safe. The implementation appears thread-safe in that there is some method synchronization being done, but the spec specifically says that DocumentBuilderFactory.newDocumentBuilder is thread safe.

ttt
  • 211
  • 2
  • 2
3

Primarily to answer your question no, Document Builder isn't thread-safe. But we can make it thread-safe in two ways:

  • Synchronized
  • ThreadLocal

For Synchronized what we can do is just make a synchronised block, this will work for us we should use synchronize on small blocks since it is very expensive and sometimes can make things very slow.

   DocumentBuilder documentBuilder=DocumentBuilderFactory.newInstance().newDocumentBuilder();
   synchronized(documentBuilder)
   {
      documentBuilder.parse(xmlFile.getInputStream());           
   }

The other and better approach we can follow is using ThreadLocal.

    public class XmlParser {

    private static ThreadLocal<DocumentBuilder> documentBuilder;

    public XmlParser() {
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        documentBuilder = ThreadLocal.withInitial(() -> documentBuilder(factory));
    }
    private DocumentBuilder documentBuilder(DocumentBuilderFactory factory) {
        try {
            return factory.newDocumentBuilder();
        } catch (ParserConfigurationException e) {
            throw new Exception("file is not valid");
        }
    }
    public Document parse(MultipartFile xmlFile) {
        try {
            Document parse = documentBuilder.get().parse(xmlFile.getInputStream());
            documentBuilder.remove();
            parse.normalizeDocument();
            return parse;

        } catch (IOException | SAXException e) {
            throw new Exception(e);
        }
    }
Xavi López
  • 27,550
  • 11
  • 97
  • 161
Supreet Singh
  • 882
  • 12
  • 9
  • 1
    Cool idea. Just one thing: This implementation overwrites the static field `documentBuilder` with every new instance of `XmlParser `. If `XmlParser ` is a singleton, `documentBuilder` does not have to be static. If `XmlParser ` is not a singleton, `documentBuilder` should be initialized in a static manner, like `private static final ThreadLocal documentBuilderThreadLocal = ThreadLocal.withInitial(XmlParser ::createBuilder);` – RoBeaToZ Oct 12 '20 at 10:43
2

You need to know three things:

  1. What is the cost of creating the factory? If the cost is low, your performance gain might be close to zero.
  2. What is the cost of creating the builder? If the cost is low, your performance gain might be close to zero.
  3. Is the factory and/or builder thread safe? If not, you need to make sure the method accessing them is made thread safe using the synchronized keyword.

I'm not familiar with the DocumentBuilder classes you are using, but all this information should be available in its javadoc or other documentation. If the creation of certain objects is costly, they usually throw this information at you.

Rasmus Franke
  • 4,434
  • 8
  • 45
  • 62
  • OK with #1 and #2 answers. This is the real question to ask, before having to think about synchronization and any risky and costly noisy/boiler plate technical code. – skay Aug 21 '14 at 21:36
  • 10
    The cost of creating the factory is huge. Unless you have defined a system property to specifiy your implementation, DocumentBuilderFactory will perform a full classpath scan every time you retrieve an instance. See the FactoryFinder at [link](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/javax/xml/parsers)/DocumentBuilderFactory.java#DocumentBuilderFactory.newInstance%28%29[link] – dan carter May 24 '16 at 02:59
0

can we re-use DocumentBuilder instance and not create it everytime, for us new object creation is taking up 3% of CPU utilization.

Sample:-

class level variable:- private static volatile DocumentBuilder builder = null;

    if (Common.builder == null) {
        synchronized (DocumentBuilder.class) {
            if (Common.builder == null) {
                SplunkLogger.info("DocBuilderInstance=New_Instance");
                Common.builder =
                        XMLUtil.getDocumentBuilderFactory()
                        .newDocumentBuilder(); // DocumentBuilderFactory.newInstance().newDocumentBuilder();
            } else {
                SplunkLogger.info("DocBuilderInstance=Re-Use_Existing_Instance_InnerIf");
            }
        }
    } else {
        SplunkLogger.info("DocBuilderInstance=Re-Use_Existing_Instance");
}
    final InputSource source = new InputSource();
    source.setCharacterStream(new StringReader(responseString));
    final Document doc = Common.builder.parse(source);
    return doc.getElementsByTagName(firstKey);
}