I am using simple dictionary to replace Cyrillic letters with Latin ones and most of the time its working just fine but I am having issues when there are some Latin letters as an input. Most of the time its company names.
Few examples:
PROCRED is being converted as RROSRED
ОВЕХ as OVEH
CITY as SITU
What can I do about this?
This is the dictionary I am using
public string ConvertCyrillicToLatin(string text)
{
Dictionary<string, string> words = new Dictionary<string, string>();
words.Add("А", "A");
words.Add("Б", "B");
words.Add("В", "V");
words.Add("Г", "G");
words.Add("Д", "D");
words.Add("Ђ", "Đ");
words.Add("Е", "E");
words.Add("Ж", "Ž");
words.Add("З", "Z");
words.Add("И", "I");
words.Add("Ј", "J");
words.Add("К", "K");
words.Add("Л", "L");
words.Add("Љ", "Lj");
words.Add("М", "M");
words.Add("Н", "N");
words.Add("Њ", "Nj");
words.Add("О", "O");
words.Add("П", "P");
words.Add("Р", "R");
words.Add("С", "S");
words.Add("Т", "T");
words.Add("Ћ", "Ć");
words.Add("У", "U");
words.Add("Ф", "F");
words.Add("Х", "H");
words.Add("Ц", "C");
words.Add("Ч", "Č");
words.Add("Џ", "Dž");
words.Add("Ш", "Š");
words.Add("а", "a");
words.Add("б", "b");
words.Add("в", "v");
words.Add("г", "g");
words.Add("д", "d");
words.Add("ђ", "đ");
words.Add("е", "e");
words.Add("ж", "ž");
words.Add("з", "z");
words.Add("и", "i");
words.Add("ј", "j");
words.Add("к", "k");
words.Add("л", "l");
words.Add("љ", "lj");
words.Add("м", "m");
words.Add("н", "n");
words.Add("њ", "nj");
words.Add("о", "o");
words.Add("п", "p");
words.Add("р", "r");
words.Add("с", "s");
words.Add("т", "t");
words.Add("ћ", "ć");
words.Add("у", "u");
words.Add("ф", "f");
words.Add("х", "h");
words.Add("ц", "c");
words.Add("ч", "č");
words.Add("џ", "dž");
words.Add("ш", "š");
var source = text;
foreach (KeyValuePair<string, string> pair in words)
{
source = source.Replace(pair.Key, pair.Value);
}
return source;
}
UPDATE 1
As requested in the comment, here is my exemption list:
"СIТУ":"CITY",
"OBEX":"OBEX"
Now it have just these two examples, for test, but its impossible to have a real functional exemption list with so many possibilities.
I am expecting that if application came across a Latin letter, just to ignore it and leave it as it is. Its already working like that for Latin letters which doesnt exist as Cyrillic or which exist but have the same meaning, like letters AEODGTEJKLMN... I am having issues with letters which looks the same in both Latin and Cyrillic alphabet but have different meaning, letters like С(S), Х(H), У(Y), P(R)...
UPDATE 2
Here are the few examples of input asked in the comment. The slash sign of course doesnt exit in the input, I just added it so that you can distinguish the Latin part
...ПОВЕРИОЦ /LЕNS OBEX DОО/, У СКЛАДУ СА ОДРЕДБОМ...
...ИЗЈАВА ПРИВРЕДНОГ ДРУШТВА /GRАDЈЕVINSКО РRЕDUZЕСЕ IМРЕХ LОZNIСА/ СА АДРЕСОМ...
...ЗА УГОВОР О ОТВАРАЊУ КРЕДИТНЕ ЛИНИЈЕ СА КОМПАНИЈОМ /"DOWN CITУ"/ И РАСПОН МЕСЕЧНЕ КАМАТНЕ СТОПЕ...
...КОРИСТ ПОВЕРИОЦА /ATР BANK TOUR/, СА СЕДИШТЕМ...