5

From here I learned that org.w3c.dom.Node (and other classes in the same package) are not thread-safe.

I am asking myself if and how should I cache those classes?

  • Is there a best approach?
  • Do immutable wrapper classes exist?
  • Should I use a ThreadLocal for the DocumentBuilder/DocumentBuilderFactory instead
    and recreate the Node everytime?
  • What do you do?
Community
  • 1
  • 1
MRalwasser
  • 15,605
  • 15
  • 101
  • 147
  • When do you ever get in a situation where multiple threads handle the same `Document` (let alone `Node`)? – Joachim Sauer May 11 '12 at 12:12
  • Imagine a configuration file which is (indirectly) read by a servlet. – MRalwasser May 11 '12 at 12:17
  • Sounds like that file should be parsed once and converted in a format with better accessibility. – Joachim Sauer May 11 '12 at 12:23
  • 1
    @JoachimSauer yep, but currently the systems are "as is" and heavily use xpath which is not trivial to refactor to a simple bean – MRalwasser May 11 '12 at 12:29
  • 1
    Note: the DOM classes are not thread safe even if the threads accessing the DOM are only _reading_ the DOM, due to internal node list caching (e.g., see `CoreDocumentImpl.getNodeListCache()` in OpenJDK). – Archie Jul 14 '21 at 21:33

3 Answers3

1

You don't want to cache the XML document. It would be better to read/parse it into a "configuration" object. Depending on how complex or simple your configuration is, it could be a simple Map or something more complex.

One benefit (beyond the concurrency problems from parsing the same doc from multiple threads) is that you are not tied to the XML format for your configuration. Think about how all configs used to be in properties files, then XML came about and every open source package added support for XML. Then annotations came, and that was supported then too. Hibernate is a good example of that.

What you want to do is parse your config file and keep the resulting configuration object in your cache instead of the source XML.

mprivat
  • 21,582
  • 4
  • 54
  • 64
  • I agree - but I have to support a legacy system which uses xpath expressions all over the place which cannot be easily refactored (at least not short/mid-term) to model classes / unmarshalled JAXP beans – MRalwasser May 11 '12 at 12:32
1

your only choice is to synchronize all access to the Document/Nodes. if it is well encapsulated (the DOM objects are maintained by a single class and all DOM manipulation is within that class) then you can just synchronized that entry point class. if the Nodes are passed around among other objects, then you have major problems. you would basically need to decide on a single object which would be your "lock" class and synchronize on it around all access to the config file Nodes.

jtahlborn
  • 52,909
  • 5
  • 76
  • 118
0

Sorry this is not really an answer, but if the usecase is a configuration file model to be shared across multiple threads. where does the write happen to the configuration. If it is once read and used multiple times, you don't need synchronization right? Am i missing something here?

Ravi
  • 545
  • 3
  • 5
  • 2
    yes, you can't safely read DOM objects from multiple threads. they are not thread-safe even for _reading_. – jtahlborn May 12 '12 at 17:18
  • Interesting. Could point to some resources that explain this? Thanks – Ravi May 14 '12 at 07:19
  • you mean like the link in the question? – jtahlborn May 14 '12 at 12:01
  • Yep. Didn't check that. Thats very useful information. But I Standby with my original query, if there are NO modifications after the initial building of the DOM model, why would we need to worry about concurrent access. One reason I could think is the Xpath evaluate somehow modifies the state of the Node for every call. Sounds unlikely to me – Ravi May 14 '12 at 12:33
  • nothing to do with xpath. the DOM nodes lazy load some internal state, aka, they modify themselves on read, and their self modification is not thread-safe. and this is not theoretical, we have been bitten by this in production code. – jtahlborn May 14 '12 at 14:13
  • Ohk. Thanks a lot. Useful to know – Ravi May 14 '12 at 14:18