4

I have a XSLT that I'm executing via the xdmp:invoke() function and I'm running into very long processing times to see any result (in some instances timing out completely after max time out of 3600s is reached). This XSLT runs approximately in 5sec in Oxygen editor. Some areas I think maybe impacting performance:

  1. The XSLT produces multiple output files, using xsl:result-document. The MarkLogic XSLT processor outputs these as result XML nodes, as it cannot physically save these documents to a file system.
  2. The XSLT builds variables that contain xml nodes, which then are processed by other template calls. At times these variables can hold a large set of XML nodes.

I've done some profiling on the XSLT and it seem that building the variables seems to be the most time consuming part of the execution. I'm wondering why that's the case and why does it run a lot faster on the saxon processor?

Any insight is much appreciated.

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
SalH
  • 91
  • 5

3 Answers3

2

My understanding is that there are some XSLT performance optimizations that are difficult or impossible to implement in the context of a database in comparison to a filesystem. Also, Saxon is the industry leader in XSLT and is significantly faster than almost anything on the market, although that probably doesn't account for the large discrepancy you describe.

You don't say which version of MarkLogic you're running, but version 8.0 has made significant improvements in XSLT performance. A few simple tests I ran suggested 3-4x speed improvement, depending on the XSLT.

I have run into some rare but serious performance edge cases for XSLT when running MarkLogic on Windows. Linux and OSX builds don't appear to have this problem. It is also far more highly pronounced when the XSLT tasks are running on multiple threads.

It is possible, however, to save data directly to the filesystem instead of the database using xdmp:save.

Unless your XSLTs involve very complex templating rules, I would recommend at least testing some of performance-sensitive XSLT logic in XQuery. It may be possible to port the slowest parts and pass the results of those queries to the XSLT. It's not ideal, but you might be able to achieve acceptable performance without rewriting the XSLTs.

Another idea, if the problem is simply the construction of variables in a multi-pass XSLT, is to break the XSLT into multiple XSLTs and make multiple calls to xdmp:xslt-invoke from XQuery. However, I know there is some overhead to making an xdmp:xslt-invoke call, so it may be a wash, or it may be worse.

wst
  • 11,681
  • 1
  • 24
  • 39
  • thanks for the quick response..sorry, I'm running Marklogic 7.0-4.1 on a linux OS. Its good to know that there is some improvements in 8.0. I unfortunately cannot save the documents since the xquery layer uses the results to do further processing of the documents. – SalH Nov 04 '15 at 21:05
  • @SalH While it's kind of a nebulous and potentially time consuming task, I think your best bet is to build a test case and try to break up the XSLT logic to better isolate your problem and see if alternate ways of processing (in smaller XSLT parts, or with some XQuery parts) perform better. Also, sometimes if your task is constructing very large XML in memory, there are often better ways to that leveraging the db. As a last resort, you could put Saxon behind a webserver and make HTTP calls via ML. – wst Nov 04 '15 at 21:41
0

I have come across similar performance issues with stylesheets in ML 7. To come to think of it I had similar stylesheets as the ones you have mentioned i.e. variables holding sequence of nodes. It seems xslt cannot be possibly optimised as well as xquery is. If you are not satisfied with the performance of your stylesheets I would recommend you to convert the xslt to it's equivalent xquery. I did this and achieved about 1~1.5 secs performance gains. It may be worth the effort :)

Jinson George
  • 139
  • 2
  • 11
  • Thanks for the suggestion.. It seems like that maybe the route I have to take. I've tested the xslt with ML8, however, I don't see any significant improvement and it still led to a timeout. – SalH Jan 08 '16 at 21:13
0

Well in my case, it seems that using the fn:not() function in template match rules is causing the slow performance. Perhaps if someone else is experiencing the same problem this might be a good starting point.

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
SalH
  • 91
  • 5
  • It's highly unlikely that it's `fn:not` causing your performance problem and far more likely that it's the expressions inside it. – wst Feb 02 '16 at 00:39