2

I need to be able to output the text of an XML document separated by line breaks. In other words, the XML:

<programlisting>
public static void main(String[] args){
    System.out.println("Be happy!");  
    System.out.println("And now we add annotations.");  
}
</programlisting>

needs to be represented as:

<para>public static void main(String[] args){</para>
<para>    System.out.println("Be happy!"); </para>
<para>    System.out.println("And now we add annotations.");  </para>
<para>}</para>

I thought that I should be able to use substring-before(., '\n') but for some reason it's not recognizing the line break.

I also tried to output each line as a CDATA section so that I could pull those separately, but ran into the fact that they're all smushed together into a single text node.

I'm just using regular Java here for transformation. Any ideas on how to accomplish this?

Thanks...

NickChase
  • 1,442
  • 3
  • 15
  • 23

2 Answers2

3

As was explained in this answer, all line breaks in XML are treated like the entity &#10;. This means, to split a string at a line break, you have to split at this entity.

Therefore, a solution in plain XSLT 1.0 (without extensions) can look like:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">
  <xsl:output indent="yes"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="programlisting/text()">
    <xsl:param name="text" select="."/>
    <para>
      <!-- Because we would rely on $text containing a line break when using 
           substring-before($text,'&#10;') and the last line might not have a
           trailing line break, we append one before doing substring-before().  -->
      <xsl:value-of select="substring-before(concat($text,'&#10;'),'&#10;')"/>
    </para>
    <xsl:if test="contains($text,'&#10;')">
      <xsl:apply-templates select=".">
        <xsl:with-param name="text" select="substring-after($text,'&#10;')"/>
      </xsl:apply-templates>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

With your given XML source, this outputs some empty <para> elements at the first and last linebreak. One could also check for empty lines (like Dimitre does). This however also removes empty lines somewhere in the middle of the code listing. If removing empty lines at the start and end is important while retaining empty lines in the middle, then some more clever approach would be required.

This is just demonstrating that the task is not difficult at all using plain XSLT 1.0.

Community
  • 1
  • 1
Thomas W
  • 14,757
  • 6
  • 48
  • 67
1

I. XSLT 2.0 solution:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="text()">
  <xsl:for-each select="tokenize(., '\n\r?')[.]">
   <para><xsl:sequence select="."></xsl:sequence></para>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<programlisting>
public static void main(String[] args){
    System.out.println("Be happy!");
    System.out.println("And now we add annotations.");
}
</programlisting>

the wanted, correct result is produced:

<programlisting>
   <para>public static void main(String[] args){</para>
   <para>    System.out.println("Be happy!");</para>
   <para>    System.out.println("And now we add annotations.");</para>
   <para>}</para>
</programlisting>

II. XSLT 1.0 solution, using the str-split-to-words template of FXSL:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="http://exslt.org/common" exclude-result-prefixes="ext">
  <xsl:import href="strSplit-to-Words.xsl"/>
  <xsl:output indent="yes" omit-xml-declaration="yes"/>

   <xsl:strip-space elements="*"/>
   <xsl:output indent="yes" omit-xml-declaration="yes"/>

   <xsl:param name="pDelims" select="'&#xA;&#xD;'"/>

    <xsl:template match="/">
      <xsl:variable name="vwordNodes">
        <xsl:call-template name="str-split-to-words">
          <xsl:with-param name="pStr" select="/"/>
          <xsl:with-param name="pDelimiters"
                          select="$pDelims"/>
        </xsl:call-template>
      </xsl:variable>

      <xsl:apply-templates select=
      "ext:node-set($vwordNodes)/*[normalize-space()]"/>
    </xsl:template>

    <xsl:template match="word">
      <para><xsl:value-of select="."/></para>
    </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the same XML document (above), the same correct result is produced:

<para>public static void main(String[] args){</para>
<para>    System.out.println("Be happy!");</para>
<para>    System.out.println("And now we add annotations.");</para>
<para>}</para>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • That works beautifully, thank you! XSLT 2.0 isn't available to me, but once I got all the pieces of FXSL at http://sourceforge.net/projects/fxsl/files/fxsl/FXSL%201.2/ it was smooth sailing. – NickChase Dec 30 '12 at 20:35
  • This only gives the answer and the output, but does not explain why it works. – Xavier Dass Sep 07 '17 at 03:30
  • @XavierDass, Correct. It isn'y possible to explain the foundations of XSLT in one SO answer, not to speak about the FXSL library for functional programming in XSLT 1.0/2.0. In case you are interested in these, I can recommend my own Pluralsight course "XSLT 2.0 and 1.0 Foundations" at https://www.pluralsight.com/courses/xslt-foundations-part1 and the four articles on FXSL 1.0 pointed to on this page: http://fxsl.sourceforge.net/ Learning something new is much more valuable than downvoting an answer simply because we don't understand it. – Dimitre Novatchev Sep 07 '17 at 04:47