How to remove accents and all chars <> a..z in sql-server?

Question

I need to do the following modifications to a varchar(20) field:

substitute accents with normal letters (like è to e)
after (1) remove all the chars not in a..z

for example

'aèàç=.32s df'

must become

'aeacsdf'

are there special stored functions to achieve this easily?

UPDATE: please provide a T-SQL not CLR solution. This is the workaround I temporarly did because it temporarly suits my needs, anyway using a more elegant approach would be better.

CREATE FUNCTION sf_RemoveExtraChars (@NAME nvarchar(50))
RETURNS nvarchar(50)
AS
BEGIN
  declare @TempString nvarchar(100)
  set @TempString = @NAME 
  set @TempString = LOWER(@TempString)
  set @TempString =  replace(@TempString,' ', '')
  set @TempString =  replace(@TempString,'à', 'a')
  set @TempString =  replace(@TempString,'è', 'e')
  set @TempString =  replace(@TempString,'é', 'e')
  set @TempString =  replace(@TempString,'ì', 'i')
  set @TempString =  replace(@TempString,'ò', 'o')
  set @TempString =  replace(@TempString,'ù', 'u')
  set @TempString =  replace(@TempString,'ç', 'c')
  set @TempString =  replace(@TempString,'''', '')
  set @TempString =  replace(@TempString,'`', '')
  set @TempString =  replace(@TempString,'-', '')
  return @TempString
END
GO

just fyi, you don't need to use nvarchar to have accents on your characters. unless you absolutely need it, i recommend converting to varchar — DForck42, Oct 26 '10 at 18:12
NVARCHAR is used to hold Unicode characters. Accented characters are not necessarily Unicode (and most likely they are not). — Alek Davis, Oct 26 '10 at 20:02
yes I know, but the DB I am working on is for policy all nvarchar (bizzarre policy) — UnDiUdin, Oct 27 '10 at 07:26

score 126 · Answer 1 · edited Nov 28 '17 at 13:02

126

The best way to achieve this is very simple and efficient :

SELECT 'àéêöhello!' Collate SQL_Latin1_General_CP1253_CI_AI

which outputs 'aeeohello!'

The string must not be unicode. If you have a nvarchar just cast it to varchar before using the collate.

Here is a function that answers the OP needs :

create function [dbo].[RemoveExtraChars] ( @p_OriginalString varchar(50) )
returns varchar(50) as
begin

  declare @i int = 1;  -- must start from 1, as SubString is 1-based
  declare @OriginalString varchar(100) = @p_OriginalString Collate SQL_Latin1_General_CP1253_CI_AI;
  declare @ModifiedString varchar(100) = '';

  while @i <= Len(@OriginalString)
  begin
    if SubString(@OriginalString, @i, 1) like '[a-Z]'
    begin
      set @ModifiedString = @ModifiedString + SubString(@OriginalString, @i, 1);
    end
    set @i = @i + 1;
  end

  return @ModifiedString

end

Then, the command:

select dbo.RemoveExtraChars('aèàç=.32s df')

outputs

aeacsdf

edited Nov 28 '17 at 13:02

Community

1
1

answered Oct 03 '12 at 19:03

Dominic Goulet

7,983
7
28
56

2

Thanks for upvoting ;-) I really wonder why no one ever use that method, it saved me countless times as I work in french environments. – Dominic Goulet Jan 07 '13 at 12:59
This is really great. Saved me a lot of time. More people should be upvoting this. Actually, this should be the accepted answer since it directly answers the question with a piece of code that works. – paracycle Feb 06 '13 at 18:17
@paracycle I didn't vote for it because it doesn't fully answer the OP's question. Part 2 of the requirement is to strip any character outside the range a..z. The proposed solution `SELECT 'aèàç=.32s df' COLLATE SQL_Latin1_General_CP1253_CI_AI;` outputs `aeac=.32s df`. Rhe `=`, `.`, `3`, `2`, and `` are invalid characters. – Iain Samuel McLean Elder Feb 19 '13 at 10:18
**while @i < Len(@OriginalString)** has to be: **while @i <= Len(@OriginalString)** else it will skip the last letter in the string. Thanks for the solution though! – Tim Geerts Mar 10 '14 at 14:41
The part about converting to varchar in not prominent enough, especially because this answer was not selected, but is the best one in my opinion. select family_name,CONVERT(VARCHAR(99),family_name) collate SQL_Latin1_General_Cp1251_CS_AS FROM people – carlo.borreo Aug 26 '16 at 13:43
@Dominic_Goulet your function does not work if the database has a sensitive collation like CS_AS or binary... You need to add a COLLATE French_CI_AI (as an example) to the line IF SUBSTRING... to correct the code – SQLpro Feb 06 '22 at 08:41

score 7 · Answer 2 · answered Jan 21 '14 at 18:03

7

    SELECT 'áéíóú' COLLATE Cyrillic_General_CI_AI

This will replace all accented chars...

result: aeiou

Hope this help you!

answered Jan 21 '14 at 18:03

Eduardo Caltempa

71
1
1

Interesting, it works well! I have tried to replace some other characteres like the "º" (for degrees representation), but it didn't work as well. – Vladimir Jan 04 '18 at 01:32

Alek Davis · Answer 3 · 2010-10-26T18:56:49.807

Let me clarify something first: the accented characters you show are not actually Unicode (as one answer implies); these are 8-bit ASCII characters. One thing to keep in mind: you see characters like è and à simply because this is how your code page (the code page used by your OS and/or SQL Server [I'm not sure which one]) displays them. In a different code page, these characters would be represented by totally different symbols (e.g. if you use a Cyrillic or Turkish code page).

Anyway, say you want to replace these 8-bit chars with the closest US/Latin character equivalent for your default code page [I assume these are characters from some variation of a Latin character set]. This is how I approached a similar problem (disclaimer: this is not a very elegant solution, but I could not think of anything better at the time):

Create a UDF to translate an 8-bit ASCII character to a 7-bit printable ASCII equivalent, such as:

CREATE FUNCTION dbo.fnCharToAscii
(
  @Char AS VARCHAR
)
RETURNS
  VARCHAR   
AS
BEGIN
IF (@Char IS NULL)
  RETURN ''

-- Process control and DEL chars.
IF (ASCII(@Char) < 32) OR (ASCII(@Char) = 127)
    RETURN ''

-- Return printable 7-bit ASCII chars as is.
-- UPDATE TO DELETE NON-ALPHA CHARS.
IF (ASCII(@Char) >= 32) AND (ASCII(@Char) < 127)
    RETURN @Char

-- Process 8-bit ASCII chars.
RETURN
  CASE ASCII(@Char)
    WHEN 128 THEN 'E'
    WHEN 129 THEN '?'
    WHEN 130 THEN ','
    WHEN 131 THEN 'f'
    WHEN 132 THEN ','
    WHEN 133 THEN '.'
    WHEN 134 THEN '+'
    WHEN 135 THEN '+'
    WHEN 136 THEN '^'
    WHEN 137 THEN '%'
    WHEN 138 THEN 'S'
    WHEN 139 THEN '<'
    WHEN 140 THEN 'C'
    WHEN 141 THEN '?'
    WHEN 142 THEN 'Z'
    WHEN 143 THEN '?'
    WHEN 144 THEN '?'
    WHEN 145 THEN ''''
    WHEN 146 THEN ''''
    WHEN 147 THEN '"'
    WHEN 148 THEN '"'
    WHEN 149 THEN '-'
    WHEN 150 THEN '-'
    WHEN 151 THEN '-'
    WHEN 152 THEN '~'
    WHEN 153 THEN '?'
    WHEN 154 THEN 's'
    WHEN 155 THEN '>'
    WHEN 156 THEN 'o'
    WHEN 157 THEN '?'
    WHEN 158 THEN 'z'
    WHEN 159 THEN 'Y'
    WHEN 160 THEN ' '
    WHEN 161 THEN 'i'
    WHEN 162 THEN 'c'
    WHEN 163 THEN 'L'
    WHEN 164 THEN '?'
    WHEN 165 THEN 'Y'
    WHEN 166 THEN '|'
    WHEN 167 THEN '$'
    WHEN 168 THEN '^'
    WHEN 169 THEN 'c'
    WHEN 170 THEN 'a'
    WHEN 171 THEN '<'
    WHEN 172 THEN '-'
    WHEN 173 THEN '-'
    WHEN 174 THEN 'R'
    WHEN 175 THEN '-'
    WHEN 176 THEN 'o'
    WHEN 177 THEN '+'
    WHEN 178 THEN '2'
    WHEN 179 THEN '3'
    WHEN 180 THEN ''''
    WHEN 181 THEN 'm'
    WHEN 182 THEN 'P'
    WHEN 183 THEN '-'
    WHEN 184 THEN ','
    WHEN 185 THEN '1'
    WHEN 186 THEN '0'
    WHEN 187 THEN '>'
    WHEN 188 THEN '?'
    WHEN 189 THEN '?'
    WHEN 190 THEN '?'
    WHEN 191 THEN '?'
    WHEN 192 THEN 'A'
    WHEN 193 THEN 'A'
    WHEN 194 THEN 'A'
    WHEN 195 THEN 'A'
    WHEN 196 THEN 'A'
    WHEN 197 THEN 'A'
    WHEN 198 THEN 'A'
    WHEN 199 THEN 'C'
    WHEN 200 THEN 'E'
    WHEN 201 THEN 'E'
    WHEN 202 THEN 'E'
    WHEN 203 THEN 'E'
    WHEN 204 THEN 'I'
    WHEN 205 THEN 'I'
    WHEN 206 THEN 'I'
    WHEN 207 THEN 'I'
    WHEN 208 THEN 'D'
    WHEN 209 THEN 'N'
    WHEN 210 THEN 'O'
    WHEN 211 THEN 'O'
    WHEN 212 THEN 'O'
    WHEN 213 THEN 'O'
    WHEN 214 THEN 'O'
    WHEN 215 THEN 'x'
    WHEN 216 THEN 'O'
    WHEN 217 THEN 'U'
    WHEN 218 THEN 'U'
    WHEN 219 THEN 'U'
    WHEN 220 THEN 'U'
    WHEN 221 THEN 'Y'
    WHEN 222 THEN 'b'
    WHEN 223 THEN 'B'
    WHEN 224 THEN 'a'
    WHEN 225 THEN 'a'
    WHEN 226 THEN 'a'
    WHEN 227 THEN 'a'
    WHEN 228 THEN 'a'
    WHEN 229 THEN 'a'
    WHEN 230 THEN 'a'
    WHEN 231 THEN 'c'
    WHEN 232 THEN 'e'
    WHEN 233 THEN 'e'
    WHEN 234 THEN 'e'
    WHEN 235 THEN 'e'
    WHEN 236 THEN 'i'
    WHEN 237 THEN 'i'
    WHEN 238 THEN 'i'
    WHEN 239 THEN 'i'
    WHEN 240 THEN 'o'
    WHEN 241 THEN 'n'
    WHEN 242 THEN 'o'
    WHEN 243 THEN 'o'
    WHEN 244 THEN 'o'
    WHEN 245 THEN 'o'
    WHEN 246 THEN 'o'
    WHEN 247 THEN '-'
    WHEN 248 THEN 'o'
    WHEN 249 THEN 'u'
    WHEN 250 THEN 'u'
    WHEN 251 THEN 'u'
    WHEN 252 THEN 'u'
    WHEN 253 THEN 'y'
    WHEN 254 THEN 'b'
    WHEN 255 THEN 'y'
  END
RETURN ''
END

The code above is general-purpose, so you can adjust the character mappings to remove all non-alphabetic characters, e.g. you can use code like this in the match for printable 7-bit ASCII character (this assumes case-insensitive collation):

IF @Char NOT LIKE '[a-z]' RETURN ''

To see if your character mapping for 8-bit ASCII symbols works correctly, run the following code:

DECLARE @I   INT
DECLARE @Msg VARCHAR(32)

SET @I = 128

WHILE @I < 256
BEGIN
    SELECT @Msg = CAST(@I AS VARCHAR) + 
    ': ' + 
    CHAR(@I) + 
    '=' + 
    dbo.fnCharToAscii(CHAR(@I))
    PRINT @Msg
    SET @I = @I + 1 
END

Now you can create a UDF to process a string:

CREATE FUNCTION dbo.fnStringToAscii
(
  @Value AS VARCHAR(8000)
)
RETURNS
  VARCHAR(8000) 
AS
BEGIN
IF (@Value IS NULL OR DATALENGTH(@Value) = 0)
  RETURN ''

DECLARE @Index  INT
DECLARE @Result VARCHAR(8000)

SET @Result = ''
SET @Index  = 1

WHILE (@Index <= DATALENGTH(@Value))
BEGIN
  SET @Result = @Result + dbo.fnCharToAscii(SUBSTRING(@Value, @Index, 1))
  SET @Index = @Index + 1   
END

RETURN @Result
END
GO

score 5 · Accepted Answer · answered Sep 14 '12 at 22:56

You can avoid hard-coded REPLACE statements by using a COLLATE clause with an accent-insensitive collation to compare the accented alphabetic characters to non-alphabetic ones:

DECLARE 
  @s1 NVARCHAR(200),
  @s2 NVARCHAR(200)

SET @s1 = N'aèàç=.32s df' 

SET @s2 = N''
SELECT @s2 = @s2 + no_accent 
FROM ( 
  SELECT 
    SUBSTRING(@s1, number, 1) AS accent,
    number
  FROM master.dbo.spt_values 
  WHERE TYPE = 'P'
  AND number BETWEEN 1 AND LEN(@s1) 
) s1 
INNER JOIN (
  SELECT NCHAR(number) AS no_accent
  FROM master.dbo.spt_values 
  WHERE type = 'P'
  AND (number BETWEEN 65 AND 90 OR number BETWEEN 97 AND 122) 
) s2 
  ON s1.accent COLLATE LATIN1_GENERAL_CS_AI = s2.no_accent 
ORDER BY number

SELECT @s1 
SELECT @s2 

/*
aèàç=.32s df
aeacsdf
*/

DForck42 · Answer 5 · 2010-10-27T14:04:54.057

3

Well, this isn't a whole lot better, but it's at least a tsql set solution

declare @TempString varchar(100)

set @TempString='textàè containing éìòaccentsç''''` and things-'

select @TempString=
    replace(
        replace(
            replace(
                replace(
                    replace(
                        replace(
                            replace(
                                replace(
                                    replace(
                                        replace(
                                            replace(@TempString,' ', '') 
                                        ,'à', 'a')
                                    ,'è', 'e') 
                                ,'é', 'e')
                            ,'ì', 'i')
                        ,'ò', 'o') 
                    ,'ù', 'u') 
                ,'ç', 'c') 
            ,'''', '') 
        ,'`', '')
    ,'-', '') 



select @TempString

edited Oct 27 '10 at 14:04

answered Oct 26 '10 at 18:18

DForck42

19,789
13
59
84

What do you call this a 'set solution'? It is a completely procedural batch. – Iain Samuel McLean Elder Jul 03 '12 at 15:47
@isme it's a set solution because i could dump the logic into a single update statement and update an entire table in one transaction, instead of having multiple transactions like the OP's original statement. – DForck42 Jul 03 '12 at 15:52
Can you demonstrate that by rewriting your code sample as a `SELECT` statement on a table containing many rows? – Iain Samuel McLean Elder Jul 03 '12 at 19:56
@isme I don't want to edit this into my answer, so here it is: http://textsnip.com/84da22 – DForck42 Jul 03 '12 at 20:20

score 2 · Answer 6 · edited May 23 '17 at 12:25

With two auxiliary tables in your schema, you can transform your data using just a SELECT statement.

First declare a view to implement a numbers table, from 1 to 65536. The following technique is due to Itzik Ben-Gan:

CREATE VIEW Sequence
AS
WITH T1(_) AS (SELECT NULL UNION ALL SELECT NULL),
T2(_) AS (SELECT NULL FROM T1 AS L CROSS JOIN T1 AS R),
T3(_) AS (SELECT NULL FROM T2 AS L CROSS JOIN T2 AS R),
T4(_) AS (SELECT NULL FROM T3 AS L CROSS JOIN T3 AS R),
T5(_) AS (SELECT NULL FROM T4 AS L CROSS JOIN T4 AS R)
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS Number
FROM T5;

Next declare a mapping between characters with diacritics and and their undiacritic equivlants. This example data is not complete mapping, but serves as an example:

CREATE TABLE UndiacriticMap (
  DiacriticCharacter CHAR(1) NOT NULL PRIMARY KEY,
  UndiacriticCharacter CHAR(1) NOT NULL
);

INSERT INTO UndiacriticMap (
  DiacriticCharacter,
  UndiacriticCharacter
)
VALUES
  ('à', 'a'),
  ('è', 'e'),
  ('é', 'e'),
  ('ì', 'i'),
  ('ò', 'o'),
  ('ç', 'c');

Finally declare a table with a column containing test data. The data is sourced from the question and from DForck42's answer:

CREATE TABLE TestData (
  ID INT NOT NULL PRIMARY KEY,
  String VARCHAR(50) NOT NULL
);

INSERT INTO TestData (
  ID,
  String
)
VALUES
  (1, 'textàè containing éìòaccentsç''''` and things-'),
  (2, 'aèàç=.32s df');

With these objects in place, the following SELECT statement processes the test data in a relational way, save for the the non-relational hack to concatenate the characters back together. The 'blackbox XML method' is due to Anith Sen:

WITH CharacterWise (ID, Symbol, Position) AS (
  SELECT
    ID,
    SUBSTRING(TestData.String, Sequence.Number, 1),
    Sequence.Number
  FROM TestData
  INNER JOIN Sequence ON
    Sequence.Number <= LEN(TestData.String)
),
Stripped (ID, Symbol, Position) AS (
  SELECT
    ID,
    CASE 
      WHEN UndiacriticMap.DiacriticCharacter IS NOT NULL
      THEN UndiacriticMap.UndiacriticCharacter
      ELSE CASE 
        WHEN CharacterWise.Symbol LIKE '[a-z]' COLLATE Latin1_General_BIN
        THEN CharacterWise.Symbol
        ELSE ''
      END
    END,
    CharacterWise.Position
  FROM CharacterWise
  LEFT OUTER JOIN UndiacriticMap ON
    UndiacriticMap.DiacriticCharacter = CharacterWise.Symbol
)
SELECT
  TestData.ID,
  TestData.String,
  (
    SELECT Stripped.Symbol AS [text()]
    FROM Stripped
    WHERE TestData.ID = Stripped.ID
    ORDER BY Stripped.Position
    FOR XML PATH('')
  ) AS StrippedString
FROM TestData;

The query produces the following result set:

ID          String                                             StrippedString
----------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1           textàè containing éìòaccentsç''` and things-       textaecontainingeioaccentscandthings
2           aèàç=.32s df                                       aeacsdf

score 2 · Answer 7 · edited Nov 21 '14 at 09:25

I know it's not elegant but when collate is not working and you cannot make use of functions you can use this nested replace for most used diacritics. I post this so you don't have to type it yourself again :)

select REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(myfield,'é','e'),'ê','e'),'ë','e'),'è','e'),'É','E'),'È','E'),'Ê','E'),'Ë','E'),'ð','D'),'Ð','D'),'â','a'),'à','a'),'á','a'),'ã','a'),'æ','a'),'à','a'),'å','a'),'Å','A'),'À','A'),'Á','A'),'Â','A'),'Ã','A'),'Æ','A'),'ä','a'),'Ä','A'),'ï','i'),'î','i'),'ì','i'),'í','i'),'Ì','I'),'Í','I'),'Î','I'),'Ï','I'),'ô','o'),'ò','o'),'ó','o'),'õ','o'),'ø','o'),'Ò','O'),'Ó','O'),'Ô','O'),'Õ','O'),'Ø','O'),'ö','o'),'Ö','O'),'û','u'),'ù','u'),'ú','u'),'Ù','U'),'Ú','U'),'Û','U'),'Ü','U'),'ü','u'),'ñ','n'),'Ñ','N'),'Ç','C'),'ç','c'),'ý','y'),'ÿ','y'),'Ý','Y'),'þ','T'),'Þ','t'),'ß','ss') from mytable

score 2 · Answer 8 · answered Feb 01 '16 at 09:54

2

Just two cents from me

select * From XXX  
    where cast(word as varchar(max)) collate SQL_Latin1_General_CP1253_CI_AI = 'sluiten' collate SQL_Latin1_General_CP1253_CI_AI

answered Feb 01 '16 at 09:54

Tschallacka

27,901
14
88
133

score 2 · Answer 9 · answered Oct 26 '10 at 13:50

AFAIK, there isn't a direct mapping for unicode/UTF-X characters that "look similar". Unless someone has something much cooler, I'd suggest pursuing a brute-force approach so you can get your work done until then.

It sounds like you need to do 2 passes. The first pass would be to replace letters that look similar first, then go through and remove all remaining non-English letters second.

This article can help you create a user defined function so that you can use regular expressions instead of dozens of REPLACE calls: http://msdn.microsoft.com/en-us/magazine/cc163473.aspx

Here is a dictionary that I've been using for this case:

    public static Dictionary<char, string> NonEnglishLetterMapping = new Dictionary<char, string>
    {
          {'a', "áàâãäåāăą"}
        //, {'b', ""}
        , {'c', "ćĉċč"}
        , {'d', "ďđ"}
        , {'e', "éëêèēĕėę"}
        //, {'f', ""}
        , {'g', "ĝğġģ"}
        , {'h', "ĥħ"}
        , {'i', "ìíîïĩīĭįı"}
        , {'j', "ĵ"}
        , {'k', "ķĸ"}
        , {'l', "ĺļľŀł"}
        //, {'m', ""}
        , {'n', "ñńņňŉŋ"}
        , {'o', "òóôõöōŏőơ"}
        //, {'p', ""}
        //, {'q', ""}
        , {'r', "ŕŗř"}
        , {'s', "śŝşšș"}
        , {'t', "ţťŧț"}
        , {'u', "ùúûüũūŭůűųư"}
        //, {'v', ""}
        , {'w', "ŵ"}
        //, {'x', ""}
        , {'y', "ŷ"}
        , {'z', "źżž"}
    };

Yes, thanks for the idea, anyway I am trying to obtain the desired result in T-SQL, without using CLR functions — UnDiUdin, Oct 26 '10 at 14:01

score 1 · Answer 10 · edited Jun 08 '16 at 15:23

select * from database_name.table_name 
where countries LIKE '%é%'
or countries like'%é%'
or countries like'%ê%'
or countries like'%ë%'
or countries like'%è%'
or countries like'%É%'
or countries like'%È%'
or countries like'%Ê%'
or countries like'%Ë%'
or countries like'%ð%'
or countries like'%Ð%'
or countries like'%â%'
or countries like'%à%'
or countries like'%á%'
or countries like'%ã%'
or countries like'%æ%'
or countries like'%à%'
or countries like'%å%'
or countries like'%Å%'
or countries like'%À%'
or countries like'%Á%'
or countries like'%Â%'
or countries like'%Ã%'
or countries like'%Æ%'
or countries like'%ä%'
or countries like'%Ä%'
or countries like'%ï%'
or countries like'%î%'
or countries like'%ì%'
or countries like'%í%'
or countries like'%Ì%'
or countries like'%Í%'
or countries like'%Î%'
or countries like'%Ï%'
or countries like'%ô%'
or countries like'%ò%'
or countries like'%ó%'
or countries like'%õ%'
or countries like'%ø%'
or countries like'%Ò%'
or countries like'%Ó%'
or countries like'%Ô%'
or countries like'%Õ%'
or countries like'%Ø%'
or countries like'%ö%'
or countries like'%Ö%'
or countries like'%û%'
or countries like'%ù%'
or countries like'%ú%'
or countries like'%Ù%'
or countries like'%Ú%'
or countries like'%Û%'
or countries like'%Ü%'
or countries like'%ü%'
or countries like'%ñ%'
or countries like'%Ñ%'
or countries like'%Ç%'
or countries like'%ç%'
or countries like'%ý%'
or countries like'%ÿ%'
or countries like'%Ý%'
or countries like'%þ%'
or countries like'%Þ%'
or countries like'%ß%';

score 1 · Answer 11 · answered Feb 21 '20 at 09:50

I was writing this answer for another question, but then the OP deleted the question just as I went to post it... So I'll post this here, as it's related. This doesn't use a WHILE or a Multi-Line Scalar Function (as is in the most upvoted answer), and I've provided an nvarchar and varchar solution:

`nvarchar`:

CREATE FUNCTION dbo.RemoveAccents_N (@String nvarchar(4000)) 
RETURNS table
AS RETURN

    WITH N AS(
        SELECT N
        FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
    Tally AS(
        SELECT TOP (LEN(@String)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
        FROM N N1, N N2, N N3, N N4)
    SELECT (SELECT CASE WHEN V.CS LIKE '[A-z]' THEN V.CS ELSE V.YS END
            FROM Tally T
                 CROSS APPLY (VALUES(SUBSTRING(@String,T.I,1), CONVERT(varchar(4000),SUBSTRING(@String,T.I,1)) COLLATE SQL_Latin1_General_CP1253_CI_AI))V(YS,CS)
            FOR XML PATH(N''),TYPE).value('.','nvarchar(4000)') AS AccentlessString;

`varchar`:

CREATE FUNCTION dbo.RemoveAccents (@String varchar(8000)) 
RETURNS table
AS RETURN

    WITH N AS(
        SELECT N
        FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
    Tally AS(
        SELECT TOP (LEN(@String)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
        FROM N N1, N N2, N N3, N N4)
    SELECT (SELECT CASE WHEN V.CS LIKE '[A-z]' THEN V.CS ELSE V.YS END
            FROM Tally T
                 CROSS APPLY (VALUES(SUBSTRING(@String,T.I,1), SUBSTRING(@String,T.I,1) COLLATE SQL_Latin1_General_CP1253_CI_AI))V(YS,CS)
            FOR XML PATH(N''),TYPE).value('.','varchar(8000)') AS AccentlessString;

The functions can be called as below:

SELECT *
FROM (VALUES(N'Åìèë Öàíêîâ @11 ώЙ⅔♠♪'))V(YourString)
     CROSS APPLY dbo.RemoveAccents_N(V.YourString) RA;

score 0 · Answer 12 · answered Feb 06 '22 at 08:35

Simplified with:

CREATE OR ALTER FUNCTION F_CLEAN_ACCENT_STRING(@NAME NVARCHAR(128))
RETURNS NVARCHAR(128)
AS
BEGIN
   DECLARE @OUT NVARCHAR(128) = '', @I TINYINT = 1, @C CHAR(1);
   WHILE @I <= LEN(@NAME)
   BEGIN
      IF @C NOT LIKE '[A-Z0-9_]' COLLATE French_CI_AI
         SET @OUT = @OUT + '_';
      ELSE 
         SET @OUT = @OUT + CHAR(ASCII(SUBSTRING(@NAME, @I, 1) COLLATE SQL_Latin1_General_CP1253_CI_AI));
      SET @I += 1;
   END;
   RETURN @OUT;
END;
GO

SELECT dbo.F_CLEAN_ACCENT_STRING('Électricîté de Fränce') AS UNACCENT;

UNACCENT
----------------------
Electricite de France

score -1 · Answer 13 · answered Jan 04 '18 at 19:35

-1

in postgress 10:

create extension unaccent;# as root in Your db, for each db
select unaccent("ąęśłóŻŹŁÓĄĘ");

:)

answered Jan 04 '18 at 19:35

blackmoon

324
6
17

JonyD · Answer 14 · 2021-09-30T18:18:59.347

In mySql as a function (converting + other stuff)

DELIMITER $$
CREATE DEFINER=`root`@`localhost` FUNCTION `txt2slug`(`txt` VARCHAR(50)) RETURNS varchar(50) CHARSET utf8mb4
BEGIN   
    DECLARE pl_char VARCHAR(10) DEFAULT "ęóąśłżźćń";
    DECLARE en_char VARCHAR(10) DEFAULT "eoaslzzcn";
    DECLARE slug VARCHAR(50) DEFAULT (txt); 
    DECLARE pos INT DEFAULT 1;  
        
    SET slug = LOWER(TRIM(slug));   
    SET slug = REGEXP_REPLACE(slug, "[,;:!?.]", "");    
    SET slug = REGEXP_REPLACE(slug, " ", "-");  
        
    WHILE pos <= LENGTH(pl_char) DO     
        SET slug = REGEXP_REPLACE(slug, SUBSTRING(pl_char, pos, 1), SUBSTRING(en_char, pos, 1));
        SET pos = pos + 1;
    END WHILE;  
        
    RETURN slug;    
    
END$$
DELIMITER ;

How to remove accents and all chars <> a..z in sql-server?

14 Answers14

`nvarchar`:

`varchar`:

Linked

How to remove accents and all chars <> a..z in sql-server?

14 Answers14

nvarchar:

varchar:

Linked

`nvarchar`:

`varchar`: