4

I am confused about the use of XSLT templates and when/how they are applied. Suppose I have the following XML file:

<book>
  <chapter> 1 </chapter>
  <chapter> 2 </chapter>
</book>

and I'd like to match all chapters in order. This is a XSLT stylesheet:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
  <xsl:template match="book">                                                     
    <h1>book</h1>                                                               
  </xsl:template>                                                                 
  <xsl:template match="chapter">                                                  
    <h2>chapter <xsl:value-of select="."/></h2>                                   
  </xsl:template>                                                                                                                                                 
</xsl:stylesheet>                                                               

The result of the stylesheet is

<h1>book</h1>

without the expected numeration of chapters. Adding an <xsl:apply-templates /> at the end of the book matching template didn't help. I'd like to do without an xls:for-each though.

EDIT I ought to have mentioned this: I'm using Python's lxml module which uses libxml2 and libxslt. The following code does not produce the expected result but instead the above:

import lxml.etree
xml = lxml.etree.XML("""                                                    
    <book>                                                                          
      <chapter> 1 </chapter>                                                        
      <chapter> 2 </chapter>                                                        
    </book>                                                                         
""")                                                                            
transform = lxml.etree.XSLT( lxml.etree.XML("""                                  
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
      <xsl:template match="book">                                                   
        <h1>book</h1>                                                               
        <xsl:apply-templates  />                                                    
      </xsl:template>                                                               
      <xsl:template match="chapter">                                                
        <h2>chapter <xsl:value-of select="."/></h2>                                 
      </xsl:template>                                                               
    </xsl:stylesheet>                                                               
""") )                                                                                                                                                      
html = transform(xml)                                                            
print( lxml.etree.tostring(html, pretty_print=True) )

Oddly enough, the correct (expected) result is demonstrated here. Accessing libxslt directly through the Python bindings instead of going through lxml works, however:

import libxml2                                                                  
import libxslt  

doc = libxml2.parseDoc("""                                                  
<book>                                                                      
  <chapter> 1 </chapter>                                                    
  <chapter> 2 </chapter>                                                    
</book>                                                                     
""")                                                                        

styledoc = libxml2.parseDoc("""                                             
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
  <xsl:template match="book">                                               
    <h1>book</h1>                                                           
    <xsl:apply-templates  />                                                
  </xsl:template>                                                           
  <xsl:template match="chapter">                                            
    <h2>chapter <xsl:value-of select="."/></h2>                             
  </xsl:template>                                                           
</xsl:stylesheet>                                                           
""")                                                                            
style = libxslt.parseStylesheetDoc(styledoc)                                

print( style.applyStylesheet(doc, None) )                                                                                            

What am I missing?

Jens
  • 8,423
  • 9
  • 58
  • 78
  • "*Adding an at the end of the book matching template didn't help.*" Didn't it? http://xsltransform.net/eiZQaFh Not that that's a correct solution. What is the exact output that you expect to get? – michael.hor257k Jan 31 '15 at 07:19
  • @michael.hor257k: Yes, *that* is what I'd expect. Odd. Also, what would be a more correct solution? – Jens Jan 31 '15 at 07:22
  • **1.** If that is the **exact** result you expect - a snippet of HTML that is not a well-formed XML document, yet carries an XML declaration - then that *would* be the correct approach... **2.** I cannot reproduce your problem using the *libxslt* processor - which I believe is the one used by python. – michael.hor257k Jan 31 '15 at 14:07
  • @michael.hor257k: It's just meant to demonstrate the problem. The code example runs, and prints `

    book

    \n\n`. That's the root of the confusion because I'd have expected to see the chapters too...
    – Jens Jan 31 '15 at 15:50
  • 1
    If you are getting a result that is different from the one obtained by me in the link above, using the same code, then your issue is not with the code, but with the processor. Start by finding out which processor is being used - see here how: http://stackoverflow.com/questions/25244370/how-can-we-check-that-which-xslt-processor-uses-as-default-in-solr/25245033#25245033 – michael.hor257k Jan 31 '15 at 15:56
  • Oh I was wondering about how to get that information. Thanks! My lxml 3.4.1 (Py 3.4) seems to use libxslt 1.0. That seems rather dated? – Jens Jan 31 '15 at 16:03
  • The 1.0 refers to the version of XSLT, not of the processor. – michael.hor257k Jan 31 '15 at 16:46
  • @michael.hor257k: Got it. In a virgin Python venv I install `lxml` which links against `libxslt 1.1.28`. Same problem persists. You mentioned you're usling libxslt too, what version? – Jens Jan 31 '15 at 17:34
  • I don't really know; I have a couple of applications that can invoke libxslt, but I am not sure from where. I suspect they just use the library included as part of OS X. I suggest you investigate http://xmlsoft.org/libxslt/index.html – michael.hor257k Jan 31 '15 at 18:36
  • Following the `basic.py` example on [this](http://xmlsoft.org/libxslt/python.html) site, using the direct `libxslt` bindings creates the expected output. Something seems to go awry using lxml. – Jens Jan 31 '15 at 19:30
  • @Jens No, this has only to do with outputting the result, see my answer. – Mathias Müller Jan 31 '15 at 22:09

2 Answers2

4

That really seems odd - unless you realize what happens. This has nothing to do with how lxml performs XSLT tranformations, as far as I can see.

It's just that lxml.etree.tostring() expects an object containing well-formed HTML or XML as input. You don't hand it well-formed markup:

<?xml version="1.0"?>
<h1>book</h1>                                                                          
      <h2>chapter  1 </h2>                                                        
      <h2>chapter  2 </h2>

and because you don't, it stops after the first outermost (yes, there are three) elements. Wholly justified in my opinion, there shouldn't be any reason not to output well-formed XHTML - and using an XML declaration is awful if what follows is not XML (as others have pointed out).

To prove all this, run the following code. The only change is that I simply print the result.

import lxml.etree
xml = lxml.etree.XML("""                                                    
    <book>                                                                          
      <chapter> 1 </chapter>                                                        
      <chapter> 2 </chapter>                                                        
    </book>                                                                         
""")                                                                            
transform = lxml.etree.XSLT( lxml.etree.XML("""                                  
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >

      <xsl:template match="book">                                                   
        <h1>book</h1>                                                               
        <xsl:apply-templates />                                                    
      </xsl:template>                                                               
      <xsl:template match="chapter">                                                
        <h2>chapter <xsl:value-of select="."/></h2>                                 
      </xsl:template>                                                               
    </xsl:stylesheet>                                                               
""") )                                                                                                                                                      
html = transform(xml)                                                            
print(html)

And the result from the command line is

<?xml version="1.0"?>
<h1>book</h1>                                                                          
      <h2>chapter  1 </h2>                                                        
      <h2>chapter  2 </h2>
[EMPTY OUPUT LINE]      
[EMPTY OUPUT LINE]

And, to state the now-obvious:

  • the code using libxml2 and libxslt works because the print method is different
  • modifying the XSLT stylesheet to insert a single root element works because then tostring() can serialise well-formed XML.

Using lxml 3.4.1, Python sys.version is 2.7.5, Mac OS X.

Mathias Müller
  • 22,203
  • 13
  • 58
  • 75
  • You are saying that things transformed correctly all along, but printing the result stuffed up the output? Dang! :) Thanks! – Jens Jan 31 '15 at 22:41
  • @Jens You are welcome. Sorry to iterate the point, yes the transformation is "correct" in the sense that it does what you expected, but it's _incorrect_ to have `` in your output if it's _not_ XML. So, if you definitely need this output, you should add `xsl:output omit-xml-declaration="yes"/>` as a top-level element to the stylesheet. – Mathias Müller Jan 31 '15 at 22:47
  • Thanks! The code was to demo the issue, but thanks for the pointers :) All makes sense. – Jens Jan 31 '15 at 22:57
0

Not sure why adding <xsl:apply-templates/> didn't work for you. You are missing a root element for your output XML though. This stylesheet:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="book">
    <root>
        <h1>book</h1>
        <xsl:apply-templates/>
    </root>
</xsl:template>
<xsl:template match="chapter">
    <h2>chapter <xsl:value-of select="."/>
    </h2>
</xsl:template>
</xsl:stylesheet>          

Would produce:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <h1>book</h1>
    <h2>chapter  1 </h2>
    <h2>chapter  2 </h2>
</root>
Lingamurthy CS
  • 5,412
  • 2
  • 13
  • 21
  • Why does this work when `` is inside of the `` but it does not when it's outside? – Jens Jan 31 '15 at 12:57
  • Because an XML document can only ever have one element at the top level. In your code that element is the `

    `.

    – Tomalak Jan 31 '15 at 13:26
  • I mean: why do the `chapter` templates match with `` inside the ``, and they are missing from the output when it's outside. – Jens Jan 31 '15 at 15:39
  • @Jens There is nothing like _`chapter` templates match with `` inside the ``_. Your code was fine except that the output XML you were producing didn't have a root element(which would produce an invalid XML or may throw an error). I just corrected it by adding `` as the root element. – Lingamurthy CS Jan 31 '15 at 22:03