I understand (after some pain...), that the translate function will not handle multibyte unicode. I am looking for a solution to this in order to remove all accents from characters. As a sample I have the following transform and its output:
<?xml version="1.0"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="text" encoding="UTF-8"/>
<xsl:variable name="RSEP" select="' '"/> <!-- LF -->
<xsl:template match="/">
<xsl:variable name="testwords" select="'à wɔ́rɔ, yɛrɛ, wùri'"/>
<xsl:value-of select="$testwords"/>
<xsl:value-of select="$RSEP"/>
<xsl:value-of select="translate($testwords,
'àáèéɛ̀ɛ́ɔɔ̀ɔ́ìíòóuùú',
'aaeeɛɛɔɔɔiioouuu')"/>
<xsl:value-of select="$RSEP"/>
<xsl:value-of select="normalize-unicode($testwords)"/>
<xsl:value-of select="$RSEP"/>
<xsl:value-of select="replace(normalize-unicode($testwords, 'NFKD'), '\P{IsBasicLatin}', '')"/>
<xsl:value-of select="$RSEP"/>
</xsl:template>
</xsl:stylesheet>
Output with xslt3:
à wɔ́rɔ, yɛrɛ, wùri
a wɔɔrɔ, yɛrɛ, wri
à wɔ́rɔ, yɛrɛ, wùri
a wr, yr, wuri
I realize the translate function is not expected to work. But using normalize-unicode does not seem to make any change to the string. And using a 'replace' function scoured elsewhere only seems to process the standard western european accented characters, but not the multibyte.
I have a feeling this may require some kind of regex, but I am just not sure how to go about that. Any help here appreciated.
Thanks!