2

In my text-based XML corpus I have a lot of markup of different data (using TEI schema). As part of the process of transforming these documents into a PDF, I am preprocessing it into a simplified file for xsl:fo to transform. In that preprocessing I am assigning footnote numbers by finding the markup and adding <sup>incremented integer</sup>.

A line like this:

<p>
  <seg>
    <date type="deposition_date">Item anno et die quo supra</date>. <persName>P Lapassa Senior</persName> testis iuratus idem per omnia quod predictus <persName>Hugo de Mamiros</persName>.
  </seg>
</p>

Processed with this:

<xsl:template match="tei:date">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy><sup><xsl:number count="date[@type='deposition_date'" from="tei:p" format="1" level="any"/></sup>
</xsl:template>

Outputs this (note the new line for <sup>):

<p>
  <seg>
    <date type="deposition_date">Item anno et die quo supra</date>
    <sup>1</sup>. <persName>P Lapassa Senior</persName> testis iuratus idem per omnia quod predictus <persName>Hugo de Mamiros</persName>.
  </seg>
</p>

The result is that when xsl:fo processes the <sup> into superscript, there is a space between the target and the superscript, like so:

Item anno et die quo supra 1. P Lapassa Senior testis iuratus idem per omnia quod predictus Hugo de Mamiros.

Is there a manner to stop new lines/carriage returns from being introduced in the copy process?

Additional info: I've got <xsl:strip-space elements="*"/> in the xsl document. Tested against Saxon PE 9.6 and HE 9.8.

Thanks in advance.

jbrehr
  • 775
  • 6
  • 19
  • 1
    Is that a line break copied from the input? Given the code snippets you have presented I don't see that line break in the input, I would rather assume it is perhaps introduced by using `xsl:output indent="yes"`. – Martin Honnen Dec 28 '17 at 08:08
  • I find it extraordinary that you worked that out - it was indeed the culprit. Extra question: why would a technical formatting introduce extra whitespace? If I manually indent, as I have in many places for ease of reading code, it doesn't introduce the same problem. Many thanks again giving us the benefit of your experience. – jbrehr Dec 28 '17 at 08:30
  • Well, it depends on various factors, but if you indent code you write inside of XSLT then you have to consider that https://www.w3.org/TR/xslt20/#stylesheet-stripping results in most whitespace inside stylesheets to be stripped by default. For source documents the rules are different https://www.w3.org/TR/xslt20/#strip. – Martin Honnen Dec 28 '17 at 08:42

1 Answers1

1

If you want indented output, but there are some elements with mixed content where it's not safe to let the system insert whitespace before or after contained elements, you can control this with the new XSLT 3.0 suppress-indentation property. For example, <xsl:output indent='yes' suppress-indentation='p'/> will stop any whitespace being inserted within the content of a p element.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164