Setting Turkish and English locale: translate Turkish characters to Latin equivalents

Question

I want to translate my Turkish strings to lowercase in both English and Turkish locale. I'm doing this:

String myString="YAŞAT BAYRI";
Locale trlocale= new Locale("tr-TR");
Locale enLocale = new Locale("en_US");

Log.v("mainlist", "en source: " +myString.toLowerCase(enLocale));
Log.v("mainlist", "tr source: " +myString.toLowerCase(trlocale));

The output is:

en source: yaşar bayri

tr source: yaşar bayri

But I want to have an output like this:

en source: yasar bayri

tr source: yaşar bayrı

Is this possible in Java?

score 44 · Accepted Answer · answered Oct 22 '12 at 12:07

If you are using the Locale constructor, you can and must set the language, country and variant as separate arguments:

new Locale(language)
new Locale(language, country)
new Locale(language, country, variant)

Therefore, your test program creates locales with the language "tr-TR" and "en_US". For your test program, you can use new Locale("tr", "TR") and new Locale("en", "US").

If you are using Java 1.7+, then you can also parse a language tag using Locale.forLanguageTag:

String myString="YASAT BAYRI";
Locale trlocale= Locale.forLanguageTag("tr-TR");
Locale enLocale = Locale.forLanguageTag("en_US");

Creates strings that have the appropriate lower case for the language.

score 11 · Answer 2 · answered Oct 22 '12 at 12:05

11

I think this is the problem:

Locale trlocale= new Locale("tr-TR");

Try this instead:

Locale trlocale= new Locale("tr", "TR");

That's the constructor to use to specify country and language.

answered Oct 22 '12 at 12:05

Jon Skeet

1,421,763
867
9,128
9,194

new Locale("tr") would actually be enough, since the capitalization rules for the Turkish language are independent of the country. – jarnbjo Oct 22 '12 at 12:36

score 5 · Answer 3 · edited Feb 16 '17 at 11:54

5

you can do that:

Locale trlocale= new Locale("tr","TR");

The first parameter is your language, while the other one is your country.

edited Feb 16 '17 at 11:54

CoffeDeveloper

7,961
3
35
69

answered Feb 16 '17 at 11:43

ugur tafrali

51
1
2

score 3 · Answer 4 · answered Oct 22 '12 at 12:04

If you just want the string in ASCII, without accents, the following might do. First an accented character might be split in ASCII char and a combining diacritical mark (zero-width accent). Then only those accents may be removed by regular expression replace.

public static String withoutDiacritics(String s) {
    // Decompose any ş into s and combining-,.
    String s2 = Normalizer.normalize(s, Normalizer.Form.NFD);
    return s2.replaceAll("(?s)\\p{InCombiningDiacriticalMarks}", "");
}

score 1 · Answer 5 · answered Oct 22 '12 at 12:33

Characters ş and s are different characters. Changing locale cannot help you to translate one to another. You have to create turkish-to-english characters table and do this yourself. I once did this for Vietnamic language that has a lot of such characters. You have to deal with 4 of 5, right? So, good luck!

Setting Turkish and English locale: translate Turkish characters to Latin equivalents

5 Answers5

Linked