7

I have a question for the clever people of the SO community.

Below is a snippet of XML generated by the Symphony CMS.

   <news>
        <entry>
            <title>Lorem Ipsum</title>
            <body>
                <p><strong>Lorem Ipsum</strong></p>
                <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna. Vivamus urna justo, pulvinar nec, sagittis malesuada, accumsan in, massa. Quisque mi purus, gravida eget, ultricies a, porta in, sem. Maecenas justo elit, elementum vel, feugiat vulputate, pulvinar nec, velit. Fusce vel ante et diam bibendum euismod. Nunc vel nulla non lorem dignissim placerat. Nulla magna massa, auctor et, tempor nec, auctor sit amet, turpis. Quisque odio lacus, auctor at, posuere id, suscipit eget, dui. Phasellus aliquam. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Proin varius. Phasellus cursus. Cras mattis adipiscing turpis. Sed.</p>
                <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna.</p>
            </body>
        </entry>
    </news>

What I need to do is take a portion of the <body> element, based on a specified length, for display in the blog style of:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna. Vivamus urna justo, pulvinar nec, sagittis malesuada, accumsan in, massa. Quisque mi purus, gravida eget, ultricies a, porta in, sem... more

...where more is a link to the full news item. I know I can select specific paragraphs and I also know I can use the substring function to bring a specified number of characters. However, I need to preserve the formatting of the text, i.e. the HTML tags within the <body> element.

I realise this raises issues of tag closure but there must surely be a way. Hopefully someone more experienced with XSLT can shed some light on this issue.

Mathias Müller
  • 22,203
  • 13
  • 58
  • 75
Neil Albrock
  • 1,035
  • 10
  • 16
  • XSLT2.0, I hope ? Because in XSLT1.0 that would be extra-tricky... – VonC Feb 10 '09 at 13:10
  • xslt1.0, right... I have found something, and I hope it will give you some pointer... – VonC Feb 10 '09 at 13:39
  • @Neil Albrock You need to download and unzip FXSL 1.x from http://www.sf.net/projects/fxsl and adjust the xsl:import/@href to the exact path of the scanl.xsl file. In case you have problems, let me know and I'll provide the files – Dimitre Novatchev Feb 10 '09 at 21:22
  • @Neil Albrock It would be good if you could send me the exact code. Use my userid -- with the gmail email. – Dimitre Novatchev Feb 16 '09 at 17:52

5 Answers5

5

Here is a complete XSLT 1.0 transformation that solves exactly the problem.

This XSLT transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="http://exslt.org/common"
 xmlns:f="http://fxsl.sf.net/"
 xmlns:myAdd="f:myAdd"
 xmlns:myParam="f:myParam"
 exclude-result-prefixes="ext f myAdd myParam"
>
 <xsl:import href="scanl.xsl"/>
 <!--                                         -->
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <!--                                         -->
 <myAdd:myAdd/>
 <myParam:myParam>0</myParam:myParam>
 <!--                                         -->
 <xsl:param name="pTruncateLength" select="772"/>
 <!--                                         -->
   <xsl:variable name="vFun" select="document('')/*/myAdd:*[1]"/>
   <xsl:variable name="vZero" select="document('')/*/myParam:*[1]"/>
 <!--                                         -->
   <xsl:variable name="vrtfScanResults">
           <xsl:call-template name="scanl">
             <xsl:with-param name="pFun" select="$vFun"/>
             <xsl:with-param name="pQ0" select="$vZero" />
             <xsl:with-param name="pList" select="/*/*/body//text()"/>
           </xsl:call-template>
   </xsl:variable>
 <!--                                         -->
   <xsl:variable name="vScanResults"
        select="ext:node-set($vrtfScanResults)"/>
   <xsl:variable name="vindNode" select=
    "count($vScanResults/*[. > $pTruncateLength][1]
                                   /preceding-sibling::*)"/>
 <!--                                         -->
   <xsl:variable name="vrtfTruncInfo">
       <xsl:for-each select="/*/*/body//text()">
 <!--                                         -->
         <xsl:variable name="vPos" select="position()"/>
         <tNode id="{generate-id()}">
           <xsl:attribute name="preserve">
             <xsl:if test="$vPos &lt; $vindNode">
               <xsl:value-of select="string-length(.)"/>
             </xsl:if>
             <xsl:if test="$vPos > $vindNode">
               <xsl:value-of select="0"/>
             </xsl:if>
             <xsl:if test="$vPos = $vindNode">
               <xsl:value-of select=
               "$vScanResults/*[$vindNode+1]
               -
                $pTruncateLength"/>
             </xsl:if>
           </xsl:attribute>
         </tNode>
       </xsl:for-each>
   </xsl:variable>
 <!--                                         -->
   <xsl:variable name="vTruncInfo" select="ext:node-set($vrtfTruncInfo)"/>
 <!--                                         -->
 <xsl:template match="node()|@*">
   <xsl:copy>
     <xsl:apply-templates select="node()|@*"/>
   </xsl:copy>
 </xsl:template>
 <!--                                         -->
 <xsl:template match="text()[ancestor::body]">
   <xsl:variable name="vAllowedLength"
        select="$vTruncInfo/*[@id = generate-id(current())]/@preserve"
   />
 <!--                                         -->
   <xsl:value-of select="substring(.,1,$vAllowedLength)"/>

   <xsl:if test="string-length(.) > $vAllowedLength
               and
                 $vAllowedLength > 0
                ">
     <strong> ...more</strong>
   </xsl:if>
 </xsl:template>
 <!--                                         -->
 <xsl:template match="myAdd:*" mode="f:FXSL">
   <xsl:param name="pArg1"/>
   <xsl:param name="pArg2"/>
   <xsl:value-of select="$pArg1 + string-length($pArg2)"/>
 </xsl:template>
</xsl:stylesheet>

when applied on the original source XML document:

<news>
    <entry>
        <title>Lorem Ipsum</title>
        <body>
            <p>
                <strong>Lorem Ipsum</strong>
            </p>
            <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna. Vivamus urna justo, pulvinar nec, sagittis malesuada, accumsan in, massa. Quisque mi purus, gravida eget, ultricies a, porta in, sem. Maecenas justo elit, elementum vel, feugiat vulputate, pulvinar nec, velit. Fusce vel ante et diam bibendum euismod. Nunc vel nulla non lorem dignissim placerat. Nulla magna massa, auctor et, tempor nec, auctor sit amet, turpis. Quisque odio lacus, auctor at, posuere id, suscipit eget, dui. Phasellus aliquam. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Proin varius. Phasellus cursus. Cras mattis adipiscing turpis. Sed.</p>
            <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna.</p>
            <p>This text should not be displayed</p>
        </body>
    </entry>
</news>

produces the wanted result:

<news>
   <entry>
      <title>Lorem Ipsum</title>
      <body>
         <p>
            <strong>Lorem Ipsum</strong>
         </p>
         <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed malesuada auctor magna. Vivamus urna justo, pulvinar nec, sagittis malesuada, accumsan in, massa. Quisque mi purus, gravida eget, ultricies a, porta in, sem. Maecenas justo elit, elementum vel, feugiat vulputate, pulvinar nec, velit. Fusce vel ante et diam bibendum euismod. Nunc vel nulla non lorem dignissim placerat. Nulla magna massa, auctor et, tempor nec, auctor sit amet, turpis. Quisque odio lacus, auctor at, posuere id, suscipit eget, dui. Phasellus aliquam. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Proin varius. Phasellus cursus. Cras mattis adipiscing turpis. Sed.</p>
         <p>Lorem <strong> ...more</strong>
         </p>
         <p/>
      </body>
   </entry>
</news>

Do note the following:

  1. The scanl stylesheet from the FXSL library is imported. This template is commonly used to accumulate data from processing a list of items. The function (the template matching myAdd:*) that does the actual processing is passed as a parameter to the scanl template. The other parameter that must be passed to it is the "initial" value from processing, which is to be returned if the passed list of items is empty.

  2. The global parameter $pTruncateLength holds the maximum string length exceeding which the text must be truncated

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
5

Here's my version. I've tested it over your XML sample and it works.

To invoke it, use <xsl:apply-templates select="path/to/body/*" mode="truncate"/>.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:strip-space elements="*"/>

<!-- limit: the truncation limit -->
<xsl:variable name="limit" select="250"/>

<!-- t: Total number of characters in the set -->
<xsl:variable name="t" select="string-length(normalize-space(//body))"/>

<xsl:template match="*" mode="truncate">
    <xsl:variable name="preceding-strings">
        <xsl:copy-of select="preceding::text()[ancestor::body]"/>
    </xsl:variable>

    <!-- p: number of characters up to the current node -->
    <xsl:variable name="p" select="string-length(normalize-space($preceding-strings))"/>

    <xsl:if test="$p &lt; $limit">
        <xsl:element name="{name()}">
            <xsl:apply-templates select="@*" mode="truncate"/>
            <xsl:apply-templates mode="truncate"/>
        </xsl:element>
    </xsl:if>
</xsl:template>

<xsl:template match="text()" mode="truncate">
    <xsl:variable name="preceding-strings">
        <xsl:copy-of select="preceding::text()[ancestor::body]"/>
    </xsl:variable>

    <!-- p: number of characters up to the current node -->
    <xsl:variable name="p" select="string-length(normalize-space($preceding-strings))"/>

    <!-- c: number of characters including current node -->
    <xsl:variable name="c" select="$p + string-length(.)"/>

    <xsl:choose>
        <xsl:when test="$limit &lt;= $c">
            <xsl:value-of select="substring(., 1, ($limit - $p))"/>
            <xsl:text>&#8230;</xsl:text>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="."/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

<xsl:template match="@*" mode="truncate">
    <xsl:attribute name="{name(.)}"><xsl:value-of select="."/></xsl:attribute>
</xsl:template>

</xsl:stylesheet>
  • Hello there fellow Symphonist :) I'm trying to get (any) truncate utility to work with some formatted content and am struggling. This particular solution returns a truncated version of every tag within the content (i.e. a truncated

    , a truncated

      , a truncated

      etc.) rather than a truncated version of the content as a whole. Any ideas? I'm using the 'Editor' markdown formatter for input.

    – Nathan Hornby Oct 07 '15 at 13:15
  • Ah, OK I managed to get this utility working by replacing 'body' in the script with my own node (in this case 'content'), however now I'm getting 3 ellipses at the end of the truncated text :o – Nathan Hornby Oct 07 '15 at 13:29
3

What you are asking is an XSLT ellipsis generator.

May be this xslt 1.0 template might give you some idea:

Here is the main gist of it:

<xsl:template match="text()" mode="label">
    <xsl:param name="self-x"/>
    <xsl:param name="self-y"/>
    <xsl:variable name="text" select="normalize-space(.)"/>
    <!-- a quick and dirty way to avoid problems with line breaks -->
    <!-- replace the select attribute with this call
         if you want to use a fancier way to escape whitespace
         characters:
          <xsl:call-template name="escape-ws"
            <xsl:with-param name="text" select="." /
          </xsl:call-template
    -->
    <use xlink:href="#text-box" transform="translate({$self-x} 
 {$self-y})"/>
    <!-- text nodes are marked with a little box -->
    <text x="{$self-x + $writing-bump-over}"
          y="{$self-y - $writing-bump-up}"
          style="{$text-font-style}; stroke:none; fill:{$text-color}">
      <xsl:text>"</xsl:text>
      <xsl:value-of select="substring($text,1,$max-text-length)"/>
      <!-- truncate the text node to $max-text-length -->
      <xsl:if test="string-length($text) &gt; $max-text-length">
        <!-- add an ellipsis if necessary -->
        <xsl:text>...</xsl:text>
      </xsl:if>
      <xsl:text>"</xsl:text>
    </text>
  </xsl:template>

Note:

  • you will need to replace the ellipsis by a link, but the main idea is there.
  • this represents only a small extract of the all script
  • you may not need everything in it: if you need "<use xlink:href="...", you need to declare the xlink namespace
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Thanks so much, I will give this a try and let you know how I get on. – Neil Albrock Feb 10 '09 at 13:41
  • What is the element? What is the meaning of the two parameters? The $writing-bump-xxx variables are undefined! Even if your purpose was just to give an idea, this answer fails to achieve this for now. – Dimitre Novatchev Feb 10 '09 at 14:04
  • @Dimitre the purpose was to show only the part dealing with the max-length of the string a producing the ellipsis. The rest is defined in the xslt script in the html address I mention. Plus the use part may not be needed for what Neil is after. – VonC Feb 10 '09 at 14:25
  • @VonC Not a useful answer for me. I will provide something more meaningful. – Dimitre Novatchev Feb 10 '09 at 14:30
  • I appreciate your help, hopefully through some combined effort we can sort this one out ;-) – Neil Albrock Feb 10 '09 at 16:56
0

After much hacking, I came to this solution:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<!--
    Author: Neil Albrock
    Version: 1.0
    Description: Truncate by a character limit and retain HTML content.
    Usage: 
        <xsl:call-template name="truncate">
            <xsl:with-param name="data" select="path/to/your/body" />
            <xsl:with-param name="length" select="250" />
            <xsl:with-param name="link" select="'href'" />
        </xsl:call-template>
-->

<xsl:template name="truncate">

    <!-- The node set to be worked on. -->
    <xsl:param name="data"/>
    <!-- The desired truncate length. Default to length of data. -->
    <xsl:param name="length" select="string-length($data)"/>
    <!-- More link -->
    <xsl:param name="link"/>

    <xsl:choose>
        <!-- Return whole data if it's within length. -->
        <xsl:when test="string-length($data) &lt;= $length">
            <xsl:copy-of select="$data" />
        </xsl:when>
        <!-- Truncate to desired length. -->
        <xsl:otherwise>
            <xsl:for-each select="$data/*">
                <xsl:variable name="this-node" select="string-length(.)"/>
                <xsl:variable name="preceding-nodes">
                    <xsl:copy-of select="preceding-sibling::*"/>
                </xsl:variable>
                <xsl:variable name="node-sum" select="string-length(normalize-space($preceding-nodes))"/>
                <xsl:variable name="limit" select="$node-sum + $this-node"/>

                <xsl:choose>
                    <xsl:when test="$limit &gt; $length and $node-sum &lt;= $length">
                        <p>
                        <xsl:value-of select="substring(.,1,$length - $node-sum)"/>
                        <xsl:text>&#8230;</xsl:text>
                        <a>
                            <xsl:attribute name="href">
                                <xsl:value-of select="$link"/>
                            </xsl:attribute>
                            <xsl:text>more</xsl:text>
                        </a>
                        </p>
                    </xsl:when>
                    <xsl:when test="$limit &lt; $length">
                        <xsl:copy-of select="."/>
                    </xsl:when>
                    <xsl:otherwise/>
                </xsl:choose>

            </xsl:for-each>
        </xsl:otherwise>
    </xsl:choose>

</xsl:template>

</xsl:stylesheet>

I would use the solution by Chaotic Pattern though, it's more elegant ;-)

Neil Albrock
  • 1,035
  • 10
  • 16
-3

This will be an episode in pain using XSLT. I would strongly recommend using a scripting language like Perl/Python to attempt this.

Rob Di Marco
  • 43,054
  • 9
  • 66
  • 56
  • You may be down-voted, but you're right. I can't believe how much XSLT is required to accomplish a seemingly simple text transformation problem. – Tyson Feb 10 '09 at 20:05
  • 1
    @Tyson Let's see your solution then -- in any language you like. Remember: the existing formatting must be left intact, and the length above which to truncate is cumulative. Have fun :) – Dimitre Novatchev Feb 11 '09 at 05:31