2

Imagine that my XML file looks like this:

<root>
  <test>
    Lorem ipsum dolor sit amet, <randomTag>consectetur</randomTag>
    adipiscing elit, sed do <randomTag>eiusmod</randomTag> tempor
    incididunt ut labore et dolore magna aliqua...
  </test>
</root>

The following code doesn't work:

<xsl:template match="/">
  <xsl:apply-templates select="substring(test,1,50)"/> ...
</xsl:template>

<xsl:template match="randomTag">
  <myTag><xsl:value-of select="."/></myTag>
</xsl:template>

I expect the following output:

Lorem ipsum dolor sit amet, <myTag>consectetur</myTag> 
adipiscing...

If I replace substring(test,1,50) with test it works, but I want only the first 50 characters.

I've also tried using a variable in this way:

<xsl:template match="/">
  <xsl:variable name="aux" select="substring(test,1,50)"/>
  <xsl:apply-templates select="$aux"/> ...
</xsl:template>

but even this works.

It seems to me that the problem is the substring() expression. Some advice?

nikname
  • 21
  • 1
  • 6
  • Can you explain what are you actually trying to achieve? I don't see anything in your output that cuts off at 50 characters, so the logic is not at all clear. – michael.hor257k Jul 23 '15 at 00:02
  • I'm sorry. In the example I've cut 56 characters. Now it is correct, isn't it? – nikname Jul 23 '15 at 17:04
  • I don't know if it is correct, because you didn't say what the rule is. It looks like you're trying to output only the first 50 characters of the entire text contained within the `test` element and its descendants. If the `test` element has only children (as shown in your example), then Michael Kay gave you the answer. Otherwise it's (even) more complicated. – michael.hor257k Jul 23 '15 at 17:28
  • BTW, are you using XSLT 1.0 or 2.0? – michael.hor257k Jul 23 '15 at 18:08
  • I have to do the same thing that is explained [here](http://stackoverflow.com/questions/532147/truncate-xml-with-xslt?rq=1) (oh, I've found this question just now). Pay attention to the fact that I need to apply templates to node's children. I've try to follow the answers to the linked question but I've not yet got any results. I'm using XSLT 2.0. – nikname Jul 24 '15 at 13:37
  • I made a mistake. I'm using XSLT 1.0. Does this change something? – nikname Jul 25 '15 at 15:03
  • "*I'm using XSLT 1.0. Does this change something?*" Yes, it [does](http://stackoverflow.com/questions/31575016/apply-templates-to-a-substring-in-xslt/31620394?noredirect=1#comment51204145_316203940). – michael.hor257k Jul 25 '15 at 16:03

3 Answers3

2

<xsl:apply-templates> only works with a node-set, i.e. a set of nodes that was in the original document.

In many XSLT processors, you can create additional node-set with the exsl:node-set extension function.

o11c
  • 15,265
  • 4
  • 50
  • 75
1

Until XSLT 3.0, apply-templates must select nodes (not strings or other atomic values), and match patterns can only match nodes. The substring() function delivers a string, and discards any information about elements. So yes, the substring() expression is the problem.

So how do you solve this problem? The answer is a technique called "sibling recursion". You apply templates (generally in a particular mode) to the first child, and this template does apply-templates on the immediately following sibling. As a parameter to the apply-templates you pass a parameter indicating when to stop (e.g. set it to 50 initially, and decrement by the number of characters as each node is processed, and when it reaches zero, terminate the recursion.)

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
0

As I already mentioned in the comments, if the text you want to shorten is split between the nodes of a subtree, then this gets complicated. It is complicated, because - conceptually, at least - XSLT process the branches of a tree in parallel, and there is no way to pass information from one branch to next, like you can when you're doing a loop.

There are two possible approaches you could take here:

  • force the stylesheet to process the nodes sequentially;
  • have each text node calculate what has happened in the parallel branches that precede the current branch in document order.

The second option seems easier, though it is grossly inefficient. Here's a generic example:

XML

In this document, I have inserted a § as the 100th character of each item's body.

<feed>
  <item>
    <title>Declaration</title>
    <body>
      <para>When in the Course of human events, it becomes necessary for one people to <bold>dissolve the political b§ands which have connected them with another</bold>, and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation.</para>
      <para>We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.</para>
    </body>
  </item>
  <item>
    <title>Lorem Isum</title>
    <body>
      <para>Lorem ipsum dolor sit amet consectetuer adipiscing elit. <bold>Nam interdum ante quis <italic>erat pellentesque e§lementum.</italic> Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus.</bold> Ut molestie quam sit amet ligula.</para>
      <para>In enim. Duis dapibus hendrerit quam. Donec hendrerit lectus vel nunc. Vestibulum sit amet pede nec neque dignissim vehicula.</para>
      <para>Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos hymenaeos. Phasellus eget ante. Quisque risus leo, dictum sit amet, nonummy sit amet, consectetuer ut, mi.</para>
    </body>
  </item>
  <item>
    <title>Subject</title>
    <body>
      <para>Subject to change without notice.</para>
      <para>Not responsible for direct, indirect, incidental or consequential §damages resulting from any defect, error or failure to perform.</para>
      <para>May be too intense for some viewers.</para>
    </body>
  </item>
</feed>

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:param name="limit" select="100"/>

<xsl:template match="feed">
    <body>
        <xsl:apply-templates/>
    </body>
</xsl:template>

<xsl:template match="item">
    <h3><xsl:value-of select="title"/></h3>
    <xsl:apply-templates select="body"/>
    <hr/>
</xsl:template>

<xsl:template match="body">
    <p>
        <xsl:apply-templates mode="summary"/>
    </p>
</xsl:template>

<xsl:template match="bold" mode="summary">
    <b>
        <xsl:apply-templates mode="summary"/>
    </b>    
</xsl:template> 

<xsl:template match="italic" mode="summary">
    <i>
        <xsl:apply-templates mode="summary"/>
    </i>    
</xsl:template> 

<xsl:template match="text()" mode="summary">
    <xsl:variable name="text-before">
        <xsl:value-of select="ancestor::body//text()[current() >> .]" separator=""/>
    </xsl:variable>
    <xsl:variable name="used" select="string-length($text-before)" />   
    <xsl:if test="$used lt $limit">
        <xsl:value-of select="substring(., 1, $limit - $used)"/>
        <xsl:if test="string-length(.) + $used ge $limit">
            <xsl:text>...</xsl:text>
        </xsl:if>
    </xsl:if>
</xsl:template> 

</xsl:stylesheet>

Result

<body>
   <h3>Declaration</h3>
   <p>When in the Course of human events, it becomes necessary for one people to <b>dissolve the political bß...</b>
   </p>
   <hr/>
   <h3>Lorem Isum</h3>
   <p>Lorem ipsum dolor sit amet consectetuer adipiscing elit. <b>Nam interdum ante quis <i>erat pellentesque e§...</i>
      </b>
   </p>
   <hr/>
   <h3>Subject</h3>
   <p>Subject to change without notice.Not responsible for direct, indirect, incidental or consequential ß...</p>
   <hr/>
</body>

Rendered

enter image description here

As you can see, each item is cut off after exactly 100 characters, while the internal hierarchy is preserved and each type of node can be processed separately.


Note:
After I wrote this, I went to look at the other question you linked to. The accepted answer there is very similar to this, though I believe mine is a bit simpler and it handles each item separately.

Community
  • 1
  • 1
michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • Thanks for your answer! Unfortunately I see no output. There should be an error in the template that match *text()*. Does `>>` stand for `&gg;`, `lt` for `<` and `ge` for `≥`? – nikname Jul 25 '15 at 14:59
  • @nikname "*Unfortunately I see no output.*" If you're using an XSLT 1.0 processor, then that's not surprising. – michael.hor257k Jul 25 '15 at 16:02