6

In Java 6,

System.out.println(String.valueOf('\u0130').toLowerCase());

prints i (u0069), but in Java 7 it prints i with double dots (u0069 u0307).

I understand it is a Turkish character, but how do I make Java 7 print the same output as v6 using this code?

System.out.println(inputText.toLowerCase());

Also make sure that the code can handle international text without hardcoding the toLowerCase function to use only Turkish locale.

Holger Just
  • 52,918
  • 14
  • 115
  • 123
ikirankumar
  • 115
  • 1
  • 7
  • I suspect you need to specify the Locale you are using (as the first argument). Java 7 probably uses a different default Locale. – Peter Lawrey May 07 '14 at 17:32
  • @PeterLawrey Yes, Java uses default Locale through `Locale.getdefault` which is en_US.UTF-8 in my case. But I have read that in java 7 this particular Turkish character is handle differently compared to previous version. Reference: [link](http://grepalex.com/2013/02/14/java-7-and-the-dotted--and-dotless-i/) – ikirankumar May 07 '14 at 17:35
  • Consider specifying a [Normal Form](http://docs.oracle.com/javase/7/docs/api/java/text/Normalizer.Form.html). – heptadecagram May 07 '14 at 17:36
  • There is a number of characters where the upper case, lower case, or title case are two characters instead of one. This is more apparent for String. – Peter Lawrey May 07 '14 at 17:40
  • there should be a big warning in your code telling you that you are using toLowerCase without specifying a locale – njzk2 May 07 '14 at 17:40
  • @PeterLawrey Yes agreed that there are two characters equivalents, but this particular character (İ) lowercase equivalent has been changed between java versions and I want to retain the same output independent of the java version. – ikirankumar May 07 '14 at 17:46
  • @njzk2 I am not getting any warning since the program is picking up the default locale, which i mentioned earlier as en_US.UTF-8 – ikirankumar May 07 '14 at 17:48
  • @ikirankumar : ok, it's just Lint telling me this, then (a tool included in android sdk that perform various checks, including some on pure java, including adding a warning for using a toLowerCase without explicit locale) – njzk2 May 07 '14 at 17:50

1 Answers1

7

There is a quite detailed blog post about this i toLowerCase problem


Let me try to summarize the essential parts:

In Java 7 this method has indeed changed and handles this char differently than Java 6. The following code was added:

} else if (srcChar == '\u0130') { // LATIN CAPITAL LETTER I DOT
    lowerChar = Character.ERROR;
}

==> This change results in the following way:

Basically the end result of this change is that for this specific case (the upper-case dotted I), Java 7 now consults a special Unicode character database (http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt), which provides data on complex case-mappings. Looking at this file you can see several lines for the upper-case dotted I:

CODE       LOWER   TITLE   UPPER  LANGUAGE
0130;  0069 0307;   0130;   0130;
0130;  0069;        0130;   0130;       tr;
0130;  0069;        0130;   0130;       az;
donfuxx
  • 11,277
  • 6
  • 44
  • 76
  • Yes @donfuxx, I have read the site. I wanted to know a better way to get the same output as java 6. In the above link, the author has mentioned to use `dumpUnicodeCodePoints(String.valueOf('\u0130').toLowerCase(new Locale("tr")));` But I can't afford to use only Turkish locale while I am handling international text. – ikirankumar May 07 '14 at 17:42
  • Hm... maybe a quick & dirty String.replace after the String.toLowerCase will work for you then? @ikirankumar – donfuxx May 07 '14 at 17:53
  • @donfuxx Ya, that is there. I am still looking for cleaner approach that will handle similar translation discrepancies between java versions, if any – ikirankumar May 07 '14 at 17:59
  • @ikirankumar _getting the same output as java 6_ means that you ignore the fix that Java 7 did to the language. It may be a solution for you but it ignores the language that required the fix, so, in a way, it's not turkish friendly. But if you don't use turkish (or the other languages that were fixed) in your system then of course you wouldn't need to care about that. – aliopi Jan 03 '18 at 11:34