1

I am using the function below, but not sure about it is always secure... Is it? No DOM-memory or "residual XSLT" at there?

   function XSLproc_reuse($domXsl) {
      static $XSLproc=NULL;
      if (!$XSLproc)
           $XSLproc = new XSLTProcessor();
      return $XSLproc->importStylesheet($domXsl); // STABLE?
   }

There are no future "surprise side effects" on it?

PS: I have some strange bugs with my XSLT processing... So, posting one (of many other) hypothesis here, to check if ok or must be avoid. This is more evident with XPath, see this other related question.


Another way, to REUSE MORE processing sheet (that I was using on my library), is to reuse also the imported XSLT:

   function XSLproc_reuse2($nameOrDomXsl='', $domXsl=NULL) {
      static $XSLproc=NULL;
      static $name='';

      if (!$XSLproc)
                $XSLproc = new XSLTProcessor();
      // else reune of the already initialized $XSLproc.

      if (is_object($nameOrDomXsl))
                return $XSLproc->importStylesheet($nameOrDomXsl); // STABLE?
      elseif ($nameOrDomXsl==$name);
                return $XSLproc;  // imported in the last call, STABLE?
      else { // recording for future reuse:
                $name = $nameOrDomXsl;
                return $XSLproc->importStylesheet($domXsl);
      }
   }
Community
  • 1
  • 1
Peter Krauss
  • 13,174
  • 24
  • 167
  • 304

2 Answers2

1

To understand the problem it is important to understand how XSLTProcessor stores data internally and what happens after calling XSLTProcessor::importStylesheet. The code that implements this class is located in the \ext\xsl\xsltprocessor.c from php source code.

The explanation will have to simplify things a bit - is written in pure php 'c'. What's in a php object - in terms of с just functions operate on the global context.

Web need to understand how and what is happening with the imported data:

  1. XSLTProcessor::importStylesheet accept object of DOMDocument or SimpleXMLElement.
  2. Firts thing that happends with import (from 409 line of sources, docp is an importStylesheet parameter)

    //php_libxml_import_node is (in the \ext\libxml\libxml.c) just get
    //the real `xmlNodePtr` XMLlib2  object pointer by php object pointer.
    nodep = php_libxml_import_node(docp TSRMLS_CC);
    if (nodep) {
        doc = nodep->doc;
    }
    if (doc == NULL) {
        php_error(E_WARNING, "Invalid Document");
        RETURN_FALSE;
    }
    
    //Next lines is an original comments and call of `xmlCopyDoc` which makes copy
    // of your stylesheet. The main lines in my answer.
    
    /* libxslt uses _private, so we must copy the imported 
    stylesheet document otherwise the node proxies will be a mess */
    newdoc = xmlCopyDoc(doc, 1);
    ....
    //Here we create internal stylesheet object with libxslt function.
    

    sheetp = xsltParseStylesheetDoc(newdoc);
    ...

    //And some lines later store them to internal variables for this
    //XSLTProcessor class instance. 
    php_xsl_set_object(id, sheetp TSRMLS_CC); 
    
  3. After importStylesheet you can do anything you want with your $stylesheet object - it does not affect the work of XSLTProcessor becouse it uses copy of $stylesheet. But your cant to refresh or update your $stylesheet without call importStylesheet again.

Whats about XSLTProcessor::transformToDoc (php_xsl_apply_stylesheet - from 477 line of same source) and other transform methods. Each of them allocate they output type (DOMDocument in case of XSLTProcessor::transformToDoc) and use internal sheetp object(created in importStylesheet) to transform data.

Edited after comments

  1. Some words about conditions of 'when' and/or 'why' the reuse is not valid. Reuse is valid and recommended each time you need make transformTo more then once. Should reuse if stylesheep uses xsl:key on big stylesheets becouse of additional XML node traversal.
  2. Caching just XSLTProcessor object dont make any sense - contructor of this object do not allocate anything. And XSLproc_reuse2 make sense and should be used. You cache procedure of $stylesheet copying and traverse in xsl:key usage. Not the pointer but all object and it`s internals.

Lets some more words about how thetransformToDoc` works:

  1. Allocate new DOMDocument object.
  2. If $stylesheet has xsl:key it make copy of your $doc parameter.
  3. Makes tramsform with xsltNewTransformContext and xsltApplyStylesheetUser from libxslt.
  4. Return DOMDocument

Where is no any penalty code in XSLTProcessor or libxslt which goes XSLTProcessor reusage wrong. Before 0.7.2 of xslcache i try to work with this project and tune it to work with out site. There is my experience. At that time we use XSLT as template engine with large XSLT templates (in round of ~3-5mb minificated code). In such cases the caching of importStylesheet has big perfomance boost. But there is no any opportunities to cache transformToDoc results - each time you call it libxslt makes dom manipulation with two prepared objects in memory and gives you new object as result.

Nick Bondarenko
  • 6,211
  • 4
  • 35
  • 56
  • Thanks a lot, you really understand how it works! Can you add some lines "to explain the conditions of 'when' and/or 'why' the reuse is not valid" (see my bounty text)? I understand that my `XSLproc_reuse()`, that always redo *importStylesheet*, is stable; and my `XSLproc_reuse2()` is dangerous because, as you say "cant to refresh or update your $stylesheet without call importStylesheet again". – Peter Krauss Nov 20 '13 at 17:46
  • Another point: my function `XSLproc_reuse()` is to performance gain, by caching... But it only caching a pointer (?), have no utility (except to do a shortcut to *importStylesheet*)... Is it? – Peter Krauss Nov 20 '13 at 17:53
  • OPS... My aim is to reuse XSLT in a `transformToDoc()`, so what is the problem? XSLT not changes (!), only input/output transformed... So, why (theoretically) I have instabilities when reusing? – Peter Krauss Nov 20 '13 at 17:59
  • Thanks about your Edited notes! About your assertion, "XSLproc_reuse2 make sense and should be used", it can be wrong: I have observed instabilities when not "refrshing" (see related questions)... As you is an expert, maybe you can answer [a similar bounty here](http://stackoverflow.com/questions/20032717). – Peter Krauss Nov 21 '13 at 12:43
  • Peter, i have added answer to your similar question. – Nick Bondarenko Nov 22 '13 at 13:59
1

Using static defines a global state, that is by definition "unstable". It can be changed from anywhere in the program. Using an object you get a local state (inside the object instance). I suggest using a array, too. So it can store several processors for different files.

class XsltFactory {

  private $_processors = array();

  public function get($file) {
    if (!isset($this->_processors[$file])) {
      $xslDom = new DOMDocument();
      $xslDom->load($file);
      $xslProc = new XSLTProcessor();
      $xslProc->importStylesheet($xslDom);
      return $this->_processors[$file] = $xslProc;
    }
    return $this->_processors[$file];
  }
}

$xsltFactory = new XsltFactory();
var_dump(
  htmlspecialchars(
    $xsltFactory->get($template)->transformToDoc($xmlDom)->saveXml()
  )
);

A better solution to boost the performance would be xslcache. It caches the result of $xslt->importStyleSheet($filename) inside the process. If the process is reused, so is the compiled xsl.

ThW
  • 19,120
  • 3
  • 22
  • 44
  • Hello Thomas,thanks! One point each... "static is by definition unstable": I not understand, what happens with PHP (even C), is an unstable language? Do you have a link to a report of bugs that Imust to study? – Peter Krauss Nov 22 '13 at 16:10
  • About the use of a scalar (my illustrative `XSLproc_reuse2()`) or an array (your `xsltFactory` class), I think this not affect the "instability discussion", so, I like your class XsltFactory (I will be use it!), but I prefer to discuss first the concepts around XSLproc_reuse2... Question: what you thing about `return $XSLproc`, is stable (if not, where the problem and the workaround)? – Peter Krauss Nov 22 '13 at 16:20
  • The problem is not XsltProcessor instance. Like misterion described it is encapsulated in itself. If you're using static to store a state, this will be a global state, a call from anywhere in your application can modify the global state. It is a conceptual problem, not one of the language. You should be really careful using static and avoid it if possible. – ThW Nov 22 '13 at 16:45