2

I want to convert a string which contains Turkish characters to lowercase with Turkish characters mapped into English equivalents i.e. "İĞŞÇ" -> "igsc".

When I use toLowerCase(new Locale("en", "US")) function it converts for example İ to i but with dotted.

How can I solve this problem? (I'm using Java 7)

Thank you.

Pooya
  • 6,083
  • 3
  • 23
  • 43
  • Does this help: http://grepalex.com/2013/02/14/java-7-and-the-dotted--and-dotless-i/ ? – Rahul Tripathi Feb 24 '16 at 09:08
  • Welcome to Stack Overflow! Please take the [tour](http://stackoverflow.com/tour) and read [How to Ask](http://stackoverflow.com/help/how-to-ask) to learn what we expect from questions here. Please be aware that we do not provide from-scratch coding service here. Please show us what you've tried already, how it failed and we might be able to help. – Jørgen R Feb 24 '16 at 09:08

2 Answers2

15

You may

1) First, remove the accents :

the following comes from this topic :

Is there a way to get rid of accents and convert a whole string to regular letters? :

Use java.text.Normalizer to handle this for you.

string = Normalizer.normalize(string, Normalizer.Form.NFD);

This will separate all of the accent marks from the characters. Then, you just need to compare each character against being a letter and throw out the ones that aren't.

string = string.replaceAll("[^\\p{ASCII}]", "");

If your text is in unicode, you should use this instead:

string = string.replaceAll("\\p{M}", "");

For unicode, \P{M} matches the base glyph and \p{M} (lowercase) matches each accent.

2) Then, just put the remaining String to lower case

string = string.toLowerCase();
Community
  • 1
  • 1
Arnaud
  • 17,229
  • 3
  • 31
  • 44
-1
String testString = "İĞŞÇ";
System.out.println(testString);
Locale trlocale = new Locale("tr-TR");
testString = testString .toLowerCase(trlocale);
System.out.println(testString);

Works like a charm :)

Seth
  • 1,545
  • 1
  • 16
  • 30
  • The least I can say is that your solution is... not universal. When I try it, I get "iğşç", while the OP asked for "igsc"... – Alain BECKER Mar 22 '22 at 13:45