0

I'm pretty new to using XSL/XSLT to perform XML transformations, and have a scenario I'm looking for some help with.

The TLDR; summation of the problem. I am working on a C# solution to escape certain characters in MathML, specifically in <mtext> nodes. Characters include, but are not necessarily limited to {, }, [, and ], where they would need to be updated to \{, \}, \[, and \] respectively. Seeing some of the interesting things people have done with XSLT transformation, I figured I would give that a shot.

For reference, here's a sample block of MathML:

<math style='font-family:Times New Roman' xmlns='http://www.w3.org/1998/Math/MathML'>
    <mstyle mathsize='15px'>
        <mrow>
            <mtext>4&#160;___&#160;{</mtext>
            <mtext mathvariant='italic'>x</mtext>
            <mtext>:&#160;</mtext>
            <mtext mathvariant='italic'>x</mtext>
            <mtext>&#160;is&#160;a&#160;natural&#160;number&#160;greater&#160;than&#160;4}</mtext>
        </mrow>
    </mstyle>
</math>

Fiddling around, I have found that using this XSL, I can print out the contents of each <mtext> element:

<?xml version='1.0' encoding=""UTF-8""?>
<xsl:stylesheet version=""1.0"" xmlns:xsl=""http://www.w3.org/1999/XSL/Transform"">
    <xsl:output method=""xml"" indent=""yes""/>

    <xsl:template match=""node()|@*"">
        <xsl:copy>
            <xsl:apply-templates select=""node()|@*"" />
        </xsl:copy>
    </xsl:template>
    
    <xsl:template match=""/"">
        <xsl:for-each select=""//*[local-name()='mtext']"">
            <xsl:variable name=""myMTextVal"" select=""text()"" />
            <xsl:message terminate=""no"">
                <xsl:value-of select=""$myMTextVal""/>
            </xsl:message>
        </xsl:for-each>
    </xsl:template>
    
</xsl:stylesheet>

My first thought, which may seemed to quickly be an incorrect road to go down, was to use a translate() in the for-each loop as an XSL 1.0 version of XSL 2.0's replace():

<!-- Outside of the looping template -->
<xsl:param name=""braceOpen"" select=""'{'"" />
<xsl:param name=""braceOpenReplace"" select=""'\{'"" />

<!-- In the loop itself -->
<xsl:value-of select=""translate(//*[local-name()='mtext']/text(), $braceOpen, $braceOpenReplace)""/>

The problem with using translate's limitation of a variation of 1:1 replacement quickly became apparent when the first mtext's content started to display as "4 ___ \" rather than "4 ___ \{".

So digging some more, I ran across these threads:

XSLT string replace

XSLT Replace function not found

both of which offered an alternative solution in lieu of replace(). So I set up a test of:

<xsl:template name=""ProcessMathText"">
  <xsl:param name=""text""/>
  <xsl:param name=""replace""/>
  <xsl:param name=""by""/>
    <xsl:choose>
        <xsl:when test=""contains($text,$replace)"">
            <xsl:value-of select=""substring-before($text,$replace)""/>
            <xsl:value-of select=""$by""/>
            <xsl:call-template name=""ProcessMathText"">
                <xsl:with-param name=""text"" select=""substring-after($text,$replace)""/>
                <xsl:with-param name=""replace"" select=""$replace""/>
                <xsl:with-param name=""by"" select=""$by""/>
            </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select=""$text""/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

and placed this within the for-each block:

<xsl:otherwise>
    <xsl:variable name=""mTextText"" select=""Text"" />
    <xsl:call-template name=""ProcessMathText"">
        <xsl:with-param name=""text"" select=""$mTextText""/>
        <xsl:with-param name=""replace"" select=""'{'""/>
        <xsl:with-param name=""by"" select=""'\{'""/>
    </xsl:call-template>
</xsl:otherwise>

However, that began to throw "'xsl:otherwise' cannot be a child of the 'xsl:for-each' element." errors. Ultimately, I'm not 100% sure how to "invoke" the <xsl:otherwise> content as stated in the links above without it being within the for-each block, which I'm kind of wired to do based on my history with AS, JS, Python, and C#, so I was hoping someone might be able to help me out, or point me in a direction that might yield results rather than me just banging my head against a wall.

One other possible issue I have noticed on the output... It looks like the transformation results in losing the HTML entity characters such as &#160;, and having them replaced with " ", which is something I do not want, as that could cause some annoying headaches down the line. Is there a way to maintain the structure, and only replace specific content, without accidentally replacing or in a sense "rendering" HTML entities?

Thanks in advance for your help!

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Your life would be an awful lot easier if you moved forward to XSLT 2.0/3.0 rather than using these ancient 1.0-based hacks. – Michael Kay Jun 22 '22 at 17:09
  • Your xsl:for-each within xsl:template is completely wrong, because it only processes the descendant text nodes, and loses everything else. You need to follow a standard recursive-descent pattern using xsl:apply-templates, and an identity template rule, with the interesting code going in the template that matches text nodes. – Michael Kay Jun 22 '22 at 17:10
  • @MichaelKay I do not disagree, but unfortunately, that is not an option with this project. I'll look into the idea presented in your second comment, however. Thanks. – nyghtrunner Jun 22 '22 at 17:22
  • 1
    As for the entity expansion, yes, it's an unfortunate fact that XSLT expands your entity references. The simplest workaround in my experience is to preprocess the input with `s/&/§/` and then post-process the output with `s/§/&/`. – Michael Kay Jun 22 '22 at 17:43

2 Answers2

1

In XSLT 3.0 it would look like this:

<xsl:stylesheet version="3.0"  
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:math="http://www.w3.org/1998/Math/MathML">

    <xsl:output method="xml" indent="yes"/>
    <xsl:mode on-no-match="shallow-copy"/>
    
    <xsl:template match="math:mtext">
      <xsl:copy>
        <xsl:value-of select=". => replace('[', '\[', 'q')
                                => replace(']', '\]', 'q')
                                => replace('{', '\{', 'q')
                                => replace('}', '\}', 'q')"/>
      </xsl:copy>
    </xsl:template>
    
</xsl:stylesheet>
Michael Kay
  • 156,231
  • 11
  • 92
  • 164
1

It is difficult to understand what your question actually is.

In order to escape "certain characters" in mtext nodes, consider the following simplified example:

XML

<math xmlns="http://www.w3.org/1998/Math/MathML">
    <mstyle mathsize="15px">
        <mrow>
            <mtext>some text{with} all kinds of (brackets) in it</mtext>
            <mtext>a different {example}</mtext>
            <mtext>no change expected here</mtext>
            <mtext>({tough one?})</mtext>
        </mrow>
    </mstyle>
</math>

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:math="http://www.w3.org/1998/Math/MathML">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="math:mtext">
    <xsl:copy>
        <xsl:call-template name="escape-chars">
            <xsl:with-param name="string" select="."/>
        </xsl:call-template>
    </xsl:copy>
</xsl:template>

<xsl:template name="escape-chars">
    <xsl:param name="string"/>
    <xsl:param name="chars">(){}</xsl:param>
    <xsl:choose>
        <xsl:when test="$chars">
            <xsl:variable name="char" select="substring($chars, 1, 1)" />
            <xsl:choose>
                <xsl:when test="contains($string, $char)">
                    <!-- process substring-before with the remaining chars -->
                    <xsl:call-template name="escape-chars">
                        <xsl:with-param name="string" select="substring-before($string, $char)"/>
                        <xsl:with-param name="chars" select="substring($chars, 2)"/>
                    </xsl:call-template>
                    <!-- escape matched char -->
                    <xsl:value-of select="concat('\', $char)"/>
                    <!-- continue with substring-after -->
                    <xsl:call-template name="escape-chars">
                        <xsl:with-param name="string" select="substring-after($string, $char)"/>
                        <xsl:with-param name="chars" select="$chars"/>
                    </xsl:call-template>
                </xsl:when>
                <xsl:otherwise>
                    <!-- pass the entire string for processing with the remaining chars -->
                    <xsl:call-template name="escape-chars">
                        <xsl:with-param name="string" select="$string"/>
                        <xsl:with-param name="chars" select="substring($chars, 2)"/>
                    </xsl:call-template>
                </xsl:otherwise>
            </xsl:choose>
        </xsl:when>
        <xsl:otherwise>
            <xsl:value-of select="$string"/>
        </xsl:otherwise>
    </xsl:choose>
</xsl:template>

</xsl:stylesheet>

Result

<?xml version="1.0" encoding="UTF-8"?>
<math xmlns="http://www.w3.org/1998/Math/MathML">
   <mstyle mathsize="15px">
      <mrow>
         <mtext>some text\{with\} all kinds of \(brackets\) in it</mtext>
         <mtext>a different \{example\}</mtext>
         <mtext>no change expected here</mtext>
         <mtext>\(\{tough one?\}\)</mtext>
      </mrow>
   </mstyle>
</math>
michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • At a glance, this looks like it worked for what I need. That's just simple plug and play, however. Would you mind if I look through your solution, and come back here to ask if there are any questions I have about what exactly it is that you are doing in your example – nyghtrunner Jun 22 '22 at 18:05
  • I believe the comments explain exactly what happens at each step - but if you need further clarifications, I'll try to answer. – michael.hor257k Jun 22 '22 at 18:07
  • The comments are definitely helpful. I think my primary question, just from an understanding perspective here... Your `` template in effect loops by calling itself internally to process contents prior to, and also after each match to the $chars? The `substring-before()` kicks into the `` because there's no `contains()` match, and calling again what's leftover via the `substring-after()` processes the rest of the mtext content? – nyghtrunner Jun 22 '22 at 18:28
  • 1
    I am afraid I don't understand your questions. The named template indeed calls itself recursively. It starts by looking for the first character to escape. The part before this character is sent for further processing with the remaining characters. The part after this character is further processed with the current characters. When all characters have been exhausted, the string is returned *as is*. But that's exactly what the comments say. – michael.hor257k Jun 22 '22 at 18:43
  • I was just trying to make sure that I understood exactly what was happening during that sequence. Thank you again for the help! – nyghtrunner Jun 22 '22 at 18:59