1

I have text normalizing method, I use it to convert non-English letters into English letters only.

I need to do the same functionality using SQL server


C# Method:

  private  string normalizeString(string inputWord)
        {
            StringBuilder stringBuilder = new StringBuilder();
            foreach (char c in inputWord.Trim().ToCharArray())
            {
                string normalizedChar = c.ToString()
                    .Normalize(NormalizationForm.FormD).Substring(0, 1);

                stringBuilder.Append(normalizedChar);
            }

            return stringBuilder.ToString();
        }

Example

Ä => A
ä => a
Ö => O
ö => o
Õ => O
õ => o
Ü => U
ü => u
Bakri Bitar
  • 1,543
  • 18
  • 29
  • 4
    Can you clarify your question a bit more? The letters 'A' and 'a' are called Latin letters, not English letters, in Unicode. They exist in multiple languages (e.g. Spanish, Italian), _not just English_. Are you trying to remove all diacritics? Or just certain diacritics? Or just diacritics over Latin letters? Or only letters that appear in Latin alphabet?. Because Cyrillic letters can have diacritics, too! How complex of an operation are you looking for? Unicode is hard. FWIW, [Unicode Normalization Forms are defined here](http://unicode.org/reports/tr15/) – Ray Toal Jul 13 '15 at 07:43
  • Yes I want to remove all diacritics. – Bakri Bitar Jul 13 '15 at 08:44

2 Answers2

2

if you want to remove diacritics you can use Collate

for example:

select 'áéíóú' collate SQL_Latin1_General_Cp1251_CS_AS

this will return "aeiou"

Source

Community
  • 1
  • 1
j.kahil
  • 276
  • 5
  • 16
1

As per this Question there's no such native function in SQL Server. What you can do is to create a CLR Function for that.

Community
  • 1
  • 1
gsharp
  • 27,557
  • 22
  • 88
  • 134