12

I want to remove characters other than alphabets from a string in XSLT. For example

<Name>O'Niel</Name> = <Name>ONiel</Name>
<Name>St Peter</Name> = <Name>StPeter</Name>
<Name>A.David</Name> = <Name>ADavid</Name>

Can we use Regular Expression in XSLT to do this? Which is right way to implement this?

EDIT: This needs to done on XSLT 1.0.

Amzath
  • 3,159
  • 10
  • 31
  • 43
  • check my answer how to do it without RegExp, which aren't supported in XSLT/XPath 1.0 by the way. – Flack Feb 22 '11 at 22:15

4 Answers4

24

There is a pure XSLT way to do this.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>
    <xsl:variable name="vAllowedSymbols"
        select="'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'"/>
    <xsl:template match="node() | @*">
        <xsl:copy>
            <xsl:apply-templates select="node() | @*"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="text()">
        <xsl:value-of select="
            translate(
                .,
                translate(., $vAllowedSymbols, ''),
                ''
                )
            "/>
    </xsl:template>
</xsl:stylesheet>

Result against this sample:

<t>
    <Name>O'Niel</Name>
    <Name>St Peter</Name>
    <Name>A.David</Name>
</t>

Will be:

<t>
    <Name>ONiel</Name>
    <Name>StPeter</Name>
    <Name>ADavid</Name>
</t>
Flack
  • 5,862
  • 2
  • 23
  • 27
  • @Flack - This is great. My solution was useful for replacing single characters in a blacklist with any other sequence of characters, but `translate` is a much better solution for applying a whitelist or simple one-to-one replacements. – Wayne Feb 22 '11 at 22:10
  • @lwburk. You can use `translate` for "blacklist" also. It would be even simplier. No need for recursion when you don't need "string to another string" replace. – Flack Feb 22 '11 at 22:14
  • How to apply this template for a Node like this /Application/Contact/FirstName? – Amzath Feb 22 '11 at 22:15
  • @Amzath. You can basically use XPath Expr starting with `translate` whenever you take string-values of nodes and attributes. – Flack Feb 22 '11 at 22:17
  • @Flack - Right, that's part of what I was trying to say, which is ultimately why I deleted my post. You need the recursion approach in XSLT 1.0 for doing more complicated replacements, but it was complete overkill in this case. – Wayne Feb 22 '11 at 22:20
  • How to i add ' (apostrophe) as a allowed special character? – Pawan Lakhotia Apr 23 '20 at 06:59
13

Here's a 2.0 option:

EDIT: Sorry...the 1.0 requirement was added after I started on my answer.

XML

<?xml version="1.0" encoding="UTF-8"?>
<doc>
  <Name>O'Niel</Name>
  <Name>St Peter</Name>
  <Name>A.David</Name>
</doc>

XSLT 2.0

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="*|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="text()">
    <xsl:value-of select="replace(.,'[^a-zA-Z]','')"/>
  </xsl:template>

</xsl:stylesheet>

Output

<?xml version="1.0" encoding="UTF-8"?>
<doc>
   <Name>ONiel</Name>
   <Name>StPeter</Name>
   <Name>ADavid</Name>
</doc>

Here are a couple more ways of using replace()...

Using "i" (case-insensitive mode) flag:

replace(.,'[^A-Z]','','i')

Using category escapes:

replace(.,'\P{L}','')
Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
3

I just created a function based on the code in this example...

    <xsl:function name="lancet:stripSpecialChars">
    <xsl:param name="string" />
    <xsl:variable name="AllowedSymbols" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789()*%$#@!~&lt;&gt;,.?[]=- +   /\ '"/>
    <xsl:value-of select="
        translate(
            $string,
            translate($string, $AllowedSymbols, ''),
            ' ')
        "/>
</xsl:function> 

and an example of the usage would be as follows:

<xsl:value-of select="lancet:stripSpecialChars($string)"/>
1

quickest way is <xsl:value-of select="translate(Name,translate(Name,'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',''),'')" />

the inner translate removes the alphabets (the needed characters). The result of that translate leaves other characters. the outer translate removes those unwanted characters

nagarayan
  • 23
  • 3