How to convert Turkish chars to English chars in a string?

Question

string strTurkish = "ÜST";

how to make value of strTurkish as "UST" ?

Will all Turkish characters map to only one character in the range of A-Z a-z? — makerofthings7, Dec 01 '12 at 15:22

score 28 · Accepted Answer · answered Dec 01 '12 at 15:25

28

var text = "ÜST";
var unaccentedText  = String.Join("", text.Normalize(NormalizationForm.FormD)
        .Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));

answered Dec 01 '12 at 15:25

L.B

114,136
19
178
224

7

this won't normalize `ı`. Any other solution? – sertsedat Mar 07 '16 at 11:37
8

`var text = "ÜST"; var unaccentedText = String.Join("", text.Normalize(NormalizationForm.FormD) .Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)).Replace("ı", "i");` //swh – Ercument Eskar Feb 24 '17 at 11:59

ogun · Answer 2 · 2015-04-24T11:28:11.590

You can use the following method for solving your problem. The other methods do not convert "Turkish Lowercase I (\u0131)" correctly.

public static string RemoveDiacritics(string text)
{
    Encoding srcEncoding = Encoding.UTF8;
    Encoding destEncoding = Encoding.GetEncoding(1252); // Latin alphabet

    text = destEncoding.GetString(Encoding.Convert(srcEncoding, destEncoding, srcEncoding.GetBytes(text)));

    string normalizedString = text.Normalize(NormalizationForm.FormD);
    StringBuilder result = new StringBuilder();

    for (int i = 0; i < normalizedString.Length; i++)
    {
        if (!CharUnicodeInfo.GetUnicodeCategory(normalizedString[i]).Equals(UnicodeCategory.NonSpacingMark))
        {
            result.Append(normalizedString[i]);
        }
    }

    return result.ToString();
}

For .Net CORE - tested on 3.1+ - you need to add ```Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);``` because ```Encoding.GetEncoding(1252);``` is not supported by default. — Burak, Sep 17 '21 at 06:10

score 8 · Answer 3 · answered Dec 01 '12 at 15:22

I'm not an expert on this sort of thing, but I think you can use string.Normalize to do it, by decomposing the value and then effectively removing an non-ASCII characters:

using System;
using System.Linq;
using System.Text;

class Test
{
    static void Main()
    {
        string text = "\u00DCST";
        string normalized = text.Normalize(NormalizationForm.FormD);
        string asciiOnly = new string(normalized.Where(c => c < 128).ToArray());
        Console.WriteLine(asciiOnly);
    }    
}

It's entirely possible that this does horrible things in some cases though.

score 7 · Answer 4 · answered Dec 14 '20 at 10:20

public string TurkishCharacterToEnglish(string text)
{
    char[] turkishChars = {'ı', 'ğ', 'İ', 'Ğ', 'ç', 'Ç', 'ş', 'Ş', 'ö', 'Ö', 'ü', 'Ü'};
    char[] englishChars = {'i', 'g', 'I', 'G', 'c', 'C', 's', 'S', 'o', 'O', 'u', 'U'};
    
    // Match chars
    for (int i = 0; i < turkishChars.Length; i++)
        text = text.Replace(turkishChars[i], englishChars[i]);

    return text;
}

score 3 · Answer 5 · answered Apr 22 '13 at 10:24

This is not a problem that requires a general solution. It is known that there only 12 special characters in Turkish alphabet that has to be normalized. Those are ı,İ,ö,Ö,ç,Ç,ü,Ü,ğ,Ğ,ş,Ş. You can write 12 rules to replace those with their English counterparts: i,I,o,O,c,C,u,U,g,G,s,S.

score 2 · Answer 6 · answered Dec 04 '14 at 20:36

Public Function Ceng(ByVal _String As String) As String
    Dim Source As String = "ığüşöçĞÜŞİÖÇ"
    Dim Destination As String = "igusocGUSIOC"
    For i As Integer = 0 To Source.Length - 1
        _String = _String.Replace(Source(i), Destination(i))
    Next
    Return _String
End Function

score 0 · Answer 7 · answered Jan 26 '22 at 13:43

    public static string TurkishChrToEnglishChr(this string text)
    {
        if (string.IsNullOrEmpty(text)) return text;

        Dictionary<char, char> TurkishChToEnglishChDic = new Dictionary<char, char>()
        {
            {'ç','c'},
            {'Ç','C'},
            {'ğ','g'},
            {'Ğ','G'},
            {'ı','i'},
            {'İ','I'},
            {'ş','s'},
            {'Ş','S'},
            {'ö','o'},
            {'Ö','O'},
            {'ü','u'},
            {'Ü','U'}
        };

        return text.Aggregate(new StringBuilder(), (sb, chr) =>
        {
            if (TurkishChToEnglishChDic.ContainsKey(chr))
                sb.Append(TurkishChToEnglishChDic[chr]);
            else
                sb.Append(chr);

            return sb;
        }).ToString();
    }

There are **six existing answers** to this question, including a top-voted answer with over **twenty votes**. Are you _certain_ your solution hasn't already been given? If not, why do you believe your approach improves upon the existing proposals, which have been validated by the community? Offering an explanation is _always_ useful on Stack Overflow, but it's _especially_ important where the question has been resolved to the satisfaction of both the OP and the community. Help readers out by explaining what your answer does different and when it might be preferred. — Jeremy Caney, Jan 27 '22 at 00:07

How to convert Turkish chars to English chars in a string?

7 Answers7

Linked

Related