8

UPDATE 17.Jul.2013:
XALAN 2.7 does not cache document() calls within a request. So it is crucial to store each needed document in a variable in the XSL.


I have searched for quite a while and didn't find concrete answers to my simple question:

Which approach is faster or is the compiler "smart" enough so that both variants are the same?

Note: I am using Xalan 2.7 (default implementation in JDK 1.6):

1) I have to read a property in an external XML:

<xsl:value-of select="document($path)/person/address/city"/>

Whenever I need the city, I use the expression above (let's say 100 times)

2) Instead of calling the document() 100 times, I store the XML node in a variable:

<xsl:variable name="node" select="document($path)"/>

And then I use 100 times

<xsl:value-of select="$node/person/address/city"/>

Which one is faster, better, for which reasons? Thank you!

basZero
  • 4,129
  • 9
  • 51
  • 89
  • I'm also intrested in an expert answer, but, as i think, case with multiple calls of `document(path_to_doc)` are dependent on the xslt processor caching realization, in the case, when document node stored in the variable it must be loaded once in any case. – Phillip Kovalev May 10 '11 at 08:35
  • Yes, I also guess that it **depends on the implementation** of the processor, but I'm curious how **Xalan 2.7 (default processor in JDK 1.6)** does it. – basZero May 10 '11 at 09:10
  • I'm not 100% positive but I think Xalan does not cache `document()` results, but xsltproc does. However the `document()` argument is interpreted as an URI ([see spec](http://www.w3.org/TR/xslt#add-func)), so an aggressive caching would make perfect sense. – Robert Bossy May 10 '11 at 09:33
  • Good question, +1. See my answer for explanation and a recommendation of a third, more efficient solution. – Dimitre Novatchev May 10 '11 at 16:22
  • **Tested with XALAN 2.7** : each `document()` call will be executed and includes physical file access. So at least for XALAN 2.7 it makes a lot of sense to store the document in a variable. I updated my question with the test results. – basZero Jul 17 '13 at 07:27

2 Answers2

3

Both methods should execute for the same time if an XSLT processor is not naive, because the document function should return the same result when it is called with the same argument(s), no matter how many times.

Both methods are not efficient, because of the use of the // abbreviation, which causes the whole document tree to be traversed.

I would recommend the following as more efficient than both methods are being discussed:

<xsl:variable name="vCities" select="document($pUrl)//cities"/>

then only reference $vCities.

In this way you have traversed the document only once.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • 3
    +1. Dimitre, can you give me a reference for the idempotence rule you mentioned? I have heard that before but was surprised not to see it in the XSLT 1.0 or 2.0 specs. – LarsH May 10 '11 at 15:31
  • btw: the `//` was only an example and should not have been part of my question, sorry! the focus is on the `document()` function. So I'm still unsure whether it makes a difference in **XALAN 2.7**! – basZero May 11 '11 at 13:01
  • **Corrected question**: It does not contain the bad exmple anymore. I removed it because the discussion here should be on the document function. – basZero May 11 '11 at 13:03
  • @basZero -- you can and must run your own benchmark. I believe Xalan is not a naive non-optimizing processor and that you will not gain much, if anything by adding your own caching. – Dimitre Novatchev May 11 '11 at 13:14
  • **Tested with XALAN 2.7** : each `document()` call will be executed and includes physical file access. So at least for XALAN 2.7 it makes a lot of sense to store the document in a variable. – basZero Jul 17 '13 at 07:23
2

It seems that you understand the principles involved, so you don't need any explanations there.

If you want to know how Xalan 2.7 does it, the definitive answer will be found by testing it with Xalan 2.7, with a large enough test.

As @Dimitre noted, neither one of these is necessarily efficient, because of the //, though some processors are smart about optimizing those kinds of paths, mitigating the problem. You could help the processor be more efficient by keeping the city element in a variable:

<xsl:variable name="city" select="(document($path)//city)[1]"/>
...
<xsl:value-of select="$city"/>

I added [1] in there for further optimization because you said "the city" (i.e. you expect only one), and this allows a smart processor to stop after it finds the first city element.

LarsH
  • 27,481
  • 8
  • 94
  • 152
  • The discussion is not about the `//`, I removed it from the example. I will test the `document()` by trying to see requests in the log for every `document()` call. But before investing time in this, I thought somebody here would know it (from the source code). – basZero May 11 '11 at 13:05
  • Anyone care to explain why the downvote? Don't know if it was from @bas – LarsH May 11 '11 at 19:37
  • 1
    **Tested with XALAN 2.7** : each `document()` call will be executed and includes physical file access. So at least for XALAN 2.7 it makes a lot of sense to store the document in a variable. – basZero Jul 17 '13 at 07:26