Why String.endsWith and String.startWith are not consistent?

Question

I have the below test case and only the first assertion passes. Why?

@Test
public void test() {
    String i1 = "i";
    String i2 = "İ".toLowerCase();

    System.out.println((int)i1.charAt(0)); // 105
    System.out.println((int)i2.charAt(0)); // 105

    assertTrue(i2.startsWith(i1));

    assertTrue(i2.endsWith(i1));
    assertTrue(i1.endsWith(i2));
    assertTrue(i1.startsWith(i2));
}

Update after replies

What I am trying to is using startsWith and endsWith in a case insensitive way such that, below expression should return true.

"ALİ".toLowerCase().endsWith("i");

I guess it is different for C# and Java.

Can you please change your question so that you're not doing `toLowerCase()`? What is the character `toLowerCase()` outputs? — 4castle, Aug 04 '17 at 20:27
[`toLowerCase`](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#toLowerCase--) does not take a string as an argument, and it doesn't return a boolean, so it can't evaluate to true. — nbrooks, Aug 04 '17 at 20:45
See also Java Bug [JDK-8020037 String.toLowerCase incorrectly increases length, if string contains \u0130 char](https://bugs.openjdk.java.net/browse/JDK-8020037) — Andreas, Aug 04 '17 at 21:01

that other guy · Accepted Answer · 2017-08-04T20:41:57.763

This happens because lowercase İ ("latin capital letter i with dot above") in English locales turn into the two characters: "latin small letter i" and "combining dot above".

This explains why it starts with i, but doesnt end with i (it ends with a combining diacritic mark instead).

In a Turkish locale, lowercase İ simply becomes "latin small letter i" in accordance with Turkish linguistics rules, and your code would therefore work.

Here's a test program to help figure out what's going on:

class Test {
  public static void main(String[] args) {
    char[] foo = args[0].toLowerCase().toCharArray();
    System.out.print("Lowercase " + args[0] + " has " + foo.length + " chars: ");
    for(int i=0; i<foo.length; i++) System.out.print("0x" + Integer.toString((int)foo[i], 16) + " ");
    System.out.println();
  }
}

Here's what we get when we run it on a system configured for English:

$ LC_ALL=en_US.utf8 java Test "İ"
Lowercase İ has 2 chars: 0x69 0x307

Here's what we get when we run it on a system configured for Turkish:

$ LC_ALL=tr_TR.utf8 java Test "İ"
Lowercase İ has 1 chars: 0x69

This is even the specific example used by the API docs for String.toLowerCase(Locale), which is the method you can use to get the lowercase version in a specific locale, rather than the system default locale.

Andreas · Answer 2 · 2017-08-04T20:40:39.467

İ is Unicode Character 'LATIN CAPITAL LETTER I WITH DOT ABOVE' (U+0130), and is a Java String with a length of 1.

"İ".toLowerCase() returns a Java String with a length of 2:

Unicode Character 'LATIN SMALL LETTER I' (U+0069) (a normal i).
Unicode Character 'COMBINING DOT ABOVE' (U+0307).

And that is because there is no such character as a 'LATIN SMALL LETTER I WITH DOT ABOVE'. It does not exist in Unicode.

score 3 · Answer 3 · answered Aug 04 '17 at 20:35

After executing the toLowerCase() function, the string length is 2 instead of 1; the lower case version of that character is represented by two characters:

000> "İ".length()
===> 1
000> "İ".toLowerCase().length()
===> 2

The first character in its lowercase representation is a lowercase latin i, while the second character is a diacritic:

000> "İ".toLowerCase().charAt(0)
===> i
000> "İ".toLowerCase().charAt(1)
===> ̇

So the lowercase string does "start with" i, but it doesn't end with it.

ΦXocę 웃 Пepeúpa ツ · Answer 4 · 2017-08-04T20:48:16.743

Your test is failing because you are using wrong the methods...

String i2 = "İ" is a turkish capital form of i, and if you dont give a locale for the conversion then the method will fail

using a locale may help :)

public static void main(String[] args) {

    String i1 = "i";
    String i2 = "İ".toLowerCase(Locale.forLanguageTag("tr-TR"));

    System.out.println((int)i1.charAt(0)); // 105
    System.out.println((int)i2.charAt(0)); // 105

    System.out.println(i2.startsWith(i1));
    System.out.println(i2.endsWith(i1));
    System.out.println(i1.endsWith(i2));
    System.out.println(i1.startsWith(i2));
}

the output will be

105

105

true

true

true

true

What if I do not know the locale? – Mehmet Ataş Aug 04 '17 at 20:46 — Mehmet Ataş, Aug 04 '17 at 20:46

Why String.endsWith and String.startWith are not consistent?

Update after replies

4 Answers4