0

I am trying to filter out problem characters (quotes and slashes) while doing an XSLT translation but am unable to actually remove them. I've tried several proposed solutions here and they have been unsuccessful:

Replace special characters in XSLT

Removing double quotes in XSL

XSL: replace single and double quotes with ' and "

I would ideally like to replace the characters with some kind of marked word, like quotes or slash, but at this point I'd be fine just stripping them out for now.

I'm only running it on a couple selects, so it shouldn't be that hard. I'm not sure what is going wrong.

<xsl:value-of select="ns3:stepTitle"/>

EDIT:

Need to use XML 1.0.

XSLT:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*/text()">
        <xsl:value-of select="translate(., '\&quot;', '*quote*')"/>
    </xsl:template>

</xsl:stylesheet>

XML:

<test>
  I need to remove "quotes" and slashes /\ from here.
</test>

The result was:

<?xml version="1.0" encoding="UTF-16"?>
<test>
  I need to remove qquotesq and slashes /* from here.
</test>
  • Do you have a [mcve] to show your exact problem? – zx485 May 22 '18 at 21:54
  • We certainly can't tell what you are doing wrong if you don't tell us what you are doing. If you show us your code, we can tell you where it's wrong. If you show us your input and desired output, we can suggest how to write the code. With neither input nor output nor code, there's not much we can do to help. – Michael Kay May 23 '18 at 06:59
  • Also, we need to know whether you can use XSLT 2.0 (or 3.0) since that makes most things a lot easier. – Michael Kay May 23 '18 at 07:00
  • You are correct. I am sorry about that. I need to use XML 1.0 for this. Please see the edit to the question for clarification. – Joseph Shea May 23 '18 at 15:53

2 Answers2

0

It's not 100% clear what your problem is, but I'm guessing it is a variant of the problem described in this old thread from 2001. If so, the following is an example XSLT 1.0 stylesheet to replace ASCII apostrophe characters into U+2019 RIGHT SINGLE QUOTATION MARK (Unicode code point 8217 in decimal) characters. The "trick" is to define a variable holding a single-character string containing the apostrophe character, and then use the variable in calls to translate() (but could also be used with concat() to create strings with apostrophe characters):

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="text()">
    <xsl:variable name="apos" select='"&apos;"'/>
    <xsl:variable name="string-containing-quotes" select="."/>
    <xsl:variable name="string-with-quotes-replaced"
         select="translate($string-containing-quotes, $apos, '&#8217;')"/>
    <xsl:value-of select="$string-with-quotes-replaced"/>
  </xsl:template>
</xsl:stylesheet>

You can test the stylesheet with a test XML input document such as

<test>
  Text containing 'apostrophe' characters
</test>
imhotap
  • 2,275
  • 1
  • 8
  • 16
  • This is an interesting approach to the issue. Instead of replacing it with a single quote, I tried to replace it with *quote*, and the output came out as "Text containing *apostrophe* characters" Instead of "Text containing *quote*apostrophe*quote* characters" Do you know why this is happening? I figure if I can just use *quote* or other variation (*slash*, *greater than*, etc) it'll be easier since I am trying to turn this into RDF and many editors don't like some characters – Joseph Shea May 23 '18 at 15:34
  • @JosephShea to make it work for double quote chars, you need to declare the single-character string with double quote characters as the XML attribute (outer) delimiters, and single quotes/apostrophe characters for delimiting the XSLT string literal like so `` (and then use `quot` like `apos` in the example code in the answer) – imhotap May 23 '18 at 15:42
  • When I did that and tried to replace the quote with a string, it only seemed to take the first letter of the string and not the whole thing. Is there any way to have it replace the quote with a full string of several characters? – Joseph Shea May 23 '18 at 16:02
  • `translate()` replaces only single characters according to [the XPath 1.0 spec](https://www.w3.org/TR/xpath-10/); if you want to replace a single character by a multi-character string you would have to use something else – imhotap May 23 '18 at 16:20
  • 1
    See eg. https://stackoverflow.com/questions/7520762/xslt-1-0-string-replace-function#7523245 for replacing multi-character strings in XSLT 1.0; but since you stated you "want to filter-out problem characters", perhaps you can still use `translate()` with the third parameter left empty, in which case XSLT will *remove* characters – imhotap May 23 '18 at 16:30
  • Yes, that is the approach that I am going to take. Thank you for the help – Joseph Shea May 23 '18 at 17:01
0

Maybe not widely enough known feature of translate function is that the replace string (3-rd argument) can be shorter than the from string (2nd argument).

In such a case, characters from the source string (1st argument) which:

  • occur in the from string,
  • but have no corresponding characters in the replace string

are deleted.

So you have to use translate(., '/&quot;', '').

The from string has 2 chars: a slash (/) and a double quote &quot; and the replace string is empty, so both these chars will be deleted.

The example script is below:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="*/text()">
    <xsl:value-of select="translate(., '/&quot;', '')"/>
  </xsl:template>

  <xsl:template match="@*|node()">
    <xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Note: In your example you put a backslash (not a forward slash).

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41