I'm pretty new to using XSL/XSLT to perform XML transformations, and have a scenario I'm looking for some help with.
The TLDR; summation of the problem. I am working on a C# solution to escape certain characters in MathML, specifically in <mtext>
nodes. Characters include, but are not necessarily limited to {
, }
, [
, and ]
, where they would need to be updated to \{
, \}
, \[
, and \]
respectively. Seeing some of the interesting things people have done with XSLT transformation, I figured I would give that a shot.
For reference, here's a sample block of MathML:
<math style='font-family:Times New Roman' xmlns='http://www.w3.org/1998/Math/MathML'>
<mstyle mathsize='15px'>
<mrow>
<mtext>4 ___ {</mtext>
<mtext mathvariant='italic'>x</mtext>
<mtext>: </mtext>
<mtext mathvariant='italic'>x</mtext>
<mtext> is a natural number greater than 4}</mtext>
</mrow>
</mstyle>
</math>
Fiddling around, I have found that using this XSL, I can print out the contents of each <mtext>
element:
<?xml version='1.0' encoding=""UTF-8""?>
<xsl:stylesheet version=""1.0"" xmlns:xsl=""http://www.w3.org/1999/XSL/Transform"">
<xsl:output method=""xml"" indent=""yes""/>
<xsl:template match=""node()|@*"">
<xsl:copy>
<xsl:apply-templates select=""node()|@*"" />
</xsl:copy>
</xsl:template>
<xsl:template match=""/"">
<xsl:for-each select=""//*[local-name()='mtext']"">
<xsl:variable name=""myMTextVal"" select=""text()"" />
<xsl:message terminate=""no"">
<xsl:value-of select=""$myMTextVal""/>
</xsl:message>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
My first thought, which may seemed to quickly be an incorrect road to go down, was to use a translate()
in the for-each loop as an XSL 1.0 version of XSL 2.0's replace()
:
<!-- Outside of the looping template -->
<xsl:param name=""braceOpen"" select=""'{'"" />
<xsl:param name=""braceOpenReplace"" select=""'\{'"" />
<!-- In the loop itself -->
<xsl:value-of select=""translate(//*[local-name()='mtext']/text(), $braceOpen, $braceOpenReplace)""/>
The problem with using translate's limitation of a variation of 1:1 replacement quickly became apparent when the first mtext's content started to display as "4 ___ \" rather than "4 ___ \{".
So digging some more, I ran across these threads:
XSLT Replace function not found
both of which offered an alternative solution in lieu of replace()
. So I set up a test of:
<xsl:template name=""ProcessMathText"">
<xsl:param name=""text""/>
<xsl:param name=""replace""/>
<xsl:param name=""by""/>
<xsl:choose>
<xsl:when test=""contains($text,$replace)"">
<xsl:value-of select=""substring-before($text,$replace)""/>
<xsl:value-of select=""$by""/>
<xsl:call-template name=""ProcessMathText"">
<xsl:with-param name=""text"" select=""substring-after($text,$replace)""/>
<xsl:with-param name=""replace"" select=""$replace""/>
<xsl:with-param name=""by"" select=""$by""/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select=""$text""/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
and placed this within the for-each
block:
<xsl:otherwise>
<xsl:variable name=""mTextText"" select=""Text"" />
<xsl:call-template name=""ProcessMathText"">
<xsl:with-param name=""text"" select=""$mTextText""/>
<xsl:with-param name=""replace"" select=""'{'""/>
<xsl:with-param name=""by"" select=""'\{'""/>
</xsl:call-template>
</xsl:otherwise>
However, that began to throw "'xsl:otherwise' cannot be a child of the 'xsl:for-each' element." errors. Ultimately, I'm not 100% sure how to "invoke" the <xsl:otherwise>
content as stated in the links above without it being within the for-each
block, which I'm kind of wired to do based on my history with AS, JS, Python, and C#, so I was hoping someone might be able to help me out, or point me in a direction that might yield results rather than me just banging my head against a wall.
One other possible issue I have noticed on the output... It looks like the transformation results in losing the HTML entity characters such as  , and having them replaced with " ", which is something I do not want, as that could cause some annoying headaches down the line. Is there a way to maintain the structure, and only replace specific content, without accidentally replacing or in a sense "rendering" HTML entities?
Thanks in advance for your help!