67

I have the following string

áéíóú

which I need to convert it to

aeiou

How can I achieve it? (I don't need to compare, I need the new string to save)

BrunoLM
  • 97,872
  • 84
  • 296
  • 452
  • I'm looking for solution which remove accents, but leave all foreign letters in Unicode. I.e. I need convert 'á' to 'a', but don't touch 'я' or 'жопа' – Mike Keskinov Jun 04 '12 at 20:32

6 Answers6

127

Try using COLLATE:

select 'áéíóú' collate SQL_Latin1_General_Cp1251_CS_AS

For Unicode data, try the following:

select cast(N'áéíóú' as varchar(max)) collate SQL_Latin1_General_Cp1251_CS_AS

I am not sure what you may lose in the translation when using the second approach.

Update

It looks like œ is a special case, and we have to handle upper and lower case separately. You can do it like this (this code is a good candidate for a user-defined function):

declare @str nvarchar(max) = N'ñaàeéêèioô; Œuf un œuf'
select cast(
    replace((
        replace(@str collate Latin1_General_CS_AS, 'Œ' collate Latin1_General_CS_AS, 'OE' collate Latin1_General_CS_AS) 
    ) collate Latin1_General_CS_AS, 'œ' collate Latin1_General_CS_AS, 'oe' collate Latin1_General_CS_AS) as varchar(max)
) collate SQL_Latin1_General_Cp1251_CS_AS 
-- Output:
-- naaeeeeioo; Oeuf un oeuf

User Defined Function

create function dbo.fnRemoveAccents(@str nvarchar(max))  
returns varchar(max) as
begin
return cast(
    replace((
        replace(@str collate Latin1_General_CS_AS, 'Œ' collate Latin1_General_CS_AS, 'OE' collate Latin1_General_CS_AS) 
    ) collate Latin1_General_CS_AS, 'œ' collate Latin1_General_CS_AS, 'oe' collate Latin1_General_CS_AS) as varchar(max)
) collate SQL_Latin1_General_Cp1251_CS_AS 
end
D'Arcy Rittich
  • 167,292
  • 40
  • 290
  • 283
  • 4
    Only works for non-unicode columns. I.e., will not work for `N'áéíóú'`. – GSerg Aug 26 '10 at 19:20
  • I'm looking for solution which remove accents, but leave all foreign letters in Unicode. I.e. I need convert 'á' to 'a', but don't touch 'я' or 'жопа' – Mike Keskinov Jun 04 '12 at 20:32
  • 1
    @MikeKeskinov this would be best asked as a new question – D'Arcy Rittich Jun 04 '12 at 22:31
  • just a heads up: this cast will mess up case insensitive like queries due to the collate, make sure you address that – Gaspa79 Feb 09 '18 at 14:06
  • 1
    Those solutions don't work for the word "bœuf". `SELECT cast(N'bœuf' as varchar(max)) collate SQL_Latin1_General_CP1253_CI_AI`. It results in "b?uf". I always use those "œ" letters when testing encoding because they often break everything... – electrotype Mar 04 '18 at 16:55
  • @RedFilter Yes, I do something really similar, but also with the "æ"/"Æ" character too, which seem to be the other special case! Also, in your example, I think `'Oe'` should be `'OE'`. – electrotype Mar 08 '18 at 16:57
  • Added UDF example. – D'Arcy Rittich Apr 09 '20 at 14:50
  • Hi, just wonder why convert into SQL_Latin1_General_Cp1251_CS_AS in the end? What special about this particular Code Page? – Xu Shaoyang Jan 15 '22 at 12:48
9

Use the translate function:

SELECT TRANSLATE(
N'INPUT: ïÜ×ÌùµŪč©īĐÃÙěÓńÿâŘåòÔÕłćýçÀŻūìóèůüíÄûØõäÕťżîŃà£êřßøŽÖáďÉęúÂĪāËžŮōÑÇĆź®Š¥ĘĒśŹĚŚšŸ¢ŁéąÈđÆÍÛĄÝĎēČÊŌŇöÏňëÎæãŤñÒÚĀÅÁô',
N'ÁÀÂÃÄÅàáâãäåĀāąĄæÆÇçćĆčČ¢©đĐďĎÈÉÊËèéêëěĚĒēęĘÌÍÎÏìíîïĪīłŁ£ÑñňŇńŃÒÓÔÕÕÖØòóôõöøŌōřŘ®ŠšśŚßťŤÙÚÛÜùúûüůŮŪūµ×¥ŸÿýÝŽžżŻźŹ', 
N'aaaaaaaaaaaaaaaaaaccccccccddddeeeeeeeeeeeeeeiiiiiiiiiilllnnnnnooooooooooooooooorrsssssttuuuuuuuuuuuuuxyyyyyzzzzzz');

-- OUTPUT: 'INPUT: iuxiuuuccidaueonyaraooolcycazuioeuuiauooaotzioaleosozoadeeuaiaezuoncczrsyeeszessycleaedaiuaydeceonoineiaatnouaaao'

Check this link to find more 'look-a-like' characters:

https://github.com/apache/lucene-solr/blob/1ca7067a810578d4e246b5434b9cdcec7145d230/lucene/analysis/common/src/java/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.java#L189

Yolofy
  • 431
  • 1
  • 6
  • 13
  • FYI: Works in SQL Server 2017+ – javdromero Jul 22 '22 at 16:41
  • 2
    NOTE that there are two bugs in the 3rd parameter: The correct translation should be `N'aaaaaaaaaaaaaaaaaaccccccccddddeeeeeeeeeeeeeeiiiiiiiiiilllnnnnnnooooooooooooooorrrsssssttuuuuuuuuuuuuuxyyyyyzzzzzz'` – flo Aug 18 '22 at 13:31
  • Chek my answer for a simpler approach also using TRANSLATE() – MarioVW Feb 08 '23 at 07:31
5

Sometimes, the string can have another COLLATION, so you still have accents in the result. In that case, you can use this line (based on this solution here):

SELECT convert(varchar, your_string) COLLATE SQL_Latin1_General_Cp1251_CS_AS;
jpmottin
  • 2,717
  • 28
  • 25
4

I had the same problem. In Greek for proper conversion to UPPER() you must suppress accent. Changing collation caused issues in other applications. Putting some REPLACE() functions I had more control on the behavior maintaining collation. Below is my ToUpperCaseGR function.

    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO

    create FUNCTION ToUpperCaseGR
    (
     @word nvarchar(max)
    )
    RETURNS nvarchar(max)
    AS
    BEGIN
        -- Declare the return variable here
        declare @res nvarchar(max)
        set @res = UPPER(@word)
        set @res = replace(@res,'Ά','Α')
        set @res = replace(@res,'Έ','Ε')
        set @res = replace(@res,'Ί','Ι')
        set @res = replace(@res,'Ή','Η')
        set @res = replace(@res,'Ό','Ο')
        set @res = replace(@res,'Ύ','Υ')
        set @res = replace(@res,'Ώ','Ω')


        -- Return the result of the function
        RETURN @res

    END
    GO
1

Use this function:

CREATE FUNCTION [dbo].[F_RemoveDiacritics] (
 @String varchar(max)
)   RETURNS varchar(max)

AS BEGIN
DECLARE @StringResult VARCHAR(max);

select @StringResult= @String collate SQL_Latin1_General_Cp1251_CS_AS

return @StringResult


END
Esperento57
  • 16,521
  • 3
  • 39
  • 45
0

Use the TRANSLATE() function, together with COLLATE and an accent-insensitive (AI) collation to do an easy catch-all replace:

SELECT TRANSLATE('TÉST ínpüt' COLLATE Latin1_General_CS_AI, 'AEIOUaeiou', 'AEIOUaeiou')

-- OUTPUT: 'TEST input'
MarioVW
  • 2,225
  • 3
  • 22
  • 28