1

Is there a simple way to remove extended ASCII characters in a varchar(max). I want to remove all ASCII characters from 128 onwards. eg - ù,ç,Ä

I have tried this solution and its not working, I think its because they are still valid ASCII characters?

How do I remove extended ASCII characters from a string in T-SQL?

Thanks

Community
  • 1
  • 1
SkipFeeney
  • 61
  • 1
  • 6
  • Is this gonna be a part of a trigger code? To avoid those values in the future? – jarlh Mar 23 '16 at 11:31
  • 3
    Use the linked answer, change `IF UNICODE(@nchar) between 32 and 255` to `IF UNICODE(@nchar) < 128` – Alex K. Mar 23 '16 at 11:39
  • The values are already in a column so either a function or Update statement will do? – SkipFeeney Mar 23 '16 at 11:41
  • @AlexK., as the linked answer is using a loop, I think this might be done better (see my answer) – Shnugo Mar 23 '16 at 12:17
  • 2
    Possible duplicate of [How do I remove extended ASCII characters from a string in T-SQL?](http://stackoverflow.com/questions/15259622/how-do-i-remove-extended-ascii-characters-from-a-string-in-t-sql) – Shnugo Mar 24 '16 at 08:11

1 Answers1

3

The linked solution is using a loop which is - if possible - something you should avoid.

My solution is completely inlineable, it's easy to create an UDF (or maybe even better: an inline TVF) from this.

The idea: Create a set of running numbers (here it's limited with the count of objects in sys.objects, but there are tons of example how to create a numbers tally on the fly). In the second CTE the strings are splitted to single characters. The final select comes back with the cleaned string.

DECLARE @tbl TABLE(ID INT IDENTITY, EvilString NVARCHAR(100));
INSERT INTO @tbl(EvilString) VALUES('ËËËËeeeeËËËË'),('ËaËËbËeeeeËËËcË');

WITH RunningNumbers AS
(
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr
    FROM sys.objects
)
,SingleChars AS
(
    SELECT tbl.ID,rn.Nmbr,SUBSTRING(tbl.EvilString,rn.Nmbr,1) AS Chr
    FROM @tbl AS tbl
    CROSS APPLY (SELECT TOP(LEN(tbl.EvilString)) Nmbr FROM RunningNumbers) AS rn 
)
SELECT ID,EvilString
      ,(
        SELECT '' + Chr 
        FROM SingleChars AS sc
        WHERE sc.ID=tbl.ID AND ASCII(Chr)<128
        ORDER BY sc.Nmbr
        FOR XML PATH('')
      ) AS GoodString
FROM @tbl As tbl

The result

1   ËËËËeeeeËËËË    eeee
2   ËaËËbËeeeeËËËcË abeeeec

Here is another answer from me where this approach is used to replace all special characters with secure characters to get plain latin

Community
  • 1
  • 1
Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • First +1 for the good solution and the alias "EvilString"... I know this is an old thread but... This will blow up if the string contains reserved XML characters or any spaces. I would suggest changing `FOR XML PATH('')` to `FOR XML PATH(''),TYPE` and including `.value('(text())[1]','varchar(8000)')` right before your `AS GoodString` alias; e.g. change `...) AS GoodString` to `...).value('(text())[1]','varchar(8000)') AS GoodString`. Since this post 2017 added STRING_AGG which circumvents the XML character issue. – Alan Burstein Apr 18 '19 at 22:58