-1

HI I have a scenario where I need to remove special characters as well as Latin Characters. I was able to strip out Latin and few special characters. But for some reason, ™ is getting converted to TM. How do I remove that using xslt? Here is my code and function

 <Last_Name xtt:fixedLength="30" xtt:required="true" xtt:severity="error" xtt:align="left"><xsl:value-of select="lancet:stripSpecialChars(replace(normalize-unicode(translate(wd:Last_Name, ',', ''), 'NFKD'), '⁄', '/'))"/></Last_Name>

function

<xsl:function name="lancet:stripSpecialChars">
<xsl:param name="string" />
<xsl:variable name="AllowedSymbols" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789()*%$#@!~&lt;&gt;™,.?[]=- +   /\ '"/>
<xsl:value-of select="replace(normalize-unicode($string, 'NFKD'), '\P{IsBasicLatin}', '')"/>

What am I expecting?

INPUT: DE’ERIKA

OUTPUT: (Right Now with my code) -> DEATMERIKA

EXPECTED OUTPUT: DEAERIKA (My code is eliminating Latin characters and few symbols)

Gopi Naidu
  • 53
  • 2
  • 7
  • 2
    Why? You should fix your code to handle Unicode instead. – SLaks Nov 29 '17 at 19:55
  • Your clients and partners are going to be _very_ upset if you remove their trademark assertions. That could get you into legal trouble. Don't do that. – msanford Nov 29 '17 at 19:58
  • 1
    And how is JavaScript/Java involved? – epascarello Nov 29 '17 at 20:02
  • @msanford This is something that happens within the file feed into different systems. It's not something I am removing trademarks. For example, if I am sending a file From X system to Y system, for many reasons, File generated from X will have certain special characters and in order to load the file into Y that has to be removed – Gopi Naidu Nov 29 '17 at 20:02
  • @GopiNaidu Do you mean that those characters are _added incorrectly by some process but not in the original data_? – msanford Nov 29 '17 at 20:04
  • Yes. Maybe added by employee during the onboarding, while completing the data related to personal information, etc – Gopi Naidu Nov 29 '17 at 20:07

1 Answers1

0

You're seeing these characters because you are using the incorrect code page somewhere: the input XML is encoded as UTF-8 but the display system assumes ASCII instead. The solution is to perform a conversion or to make the display applcation use UTF-8.

Do not delete the ’ characters! If a user enters e.g. a letter with an accent (ï) that is not present in the ASCII code page, the system will render something like ’.

Specifically, ’ is a right single quote. So the user input was:

DE'ERIKA (which is a valid name in many languages).

Rendering this as DEERIKA or (worse) DEATMERIKA is incorrect!

If you remove these characters, you're deleting part of the input. That's like proposing to change your name to "Gop Nadu" because your system can't render the 'i'.

Related question that explains what happens

Hobbes
  • 1,964
  • 3
  • 18
  • 35