How to make a Java containsignorecase that works with all human languages

Question

For example I have this simple containsignorecase method:

public static boolean containsIgnoreCase(String a, String b) {
    if (a == null || b == null) {
        return false;
    }
    return a.toLowerCase().contains(b.toLowerCase());
}

But it fails with some comparissions like: ΙΧΘΥΣ & ιχθυσ

So I switched to this library which is mentioned here:

import org.apache.commons.lang3.StringUtils;

which has its own method StringUtils.containsIgnoreCase:

public static boolean containsIgnoreCase2(String a, String b) {
    if (a == null || b == null) {
        return false;
    }

    return StringUtils.containsIgnoreCase(a, b);
}

Now it works for ΙΧΘΥΣ & ιχθυσ, but it fails for weiß & WEISS, tschüß & TSCHÜSS, ᾲ στο διάολο & Ὰͅ Στο Διάολο, ﬂour and water & FLOUR AND WATER.

So I wonder if it is possible to create something that will work for all languages or am I missing something to configure on the apache library?

I also saw that this library icu4j could be used but could not find an example

<dependency>
    <groupId>com.ibm.icu</groupId>
    <artifactId>icu4j</artifactId>
    <version>72.1</version>
</dependency>

Any help or recommendation is appreciated :)

I don't get it: the German character ß is different to its other notation 'ss' so you cannot expect it can be resolved so it is equal in the end. — kladderradatsch, Nov 11 '22 at 23:08
@kladderradatsch hmm I took this example from another question, my bad there — BugsOverflow, Nov 11 '22 at 23:09
But you probably know the user's `Locale`, why not use it? There are overloaded version for both `toLowerCase(Locale)` and `toUpperCase(Locale)`. — Alexander Ivanchenko, Nov 11 '22 at 23:16
@AlexanderIvanchenko the users of the application are in America (south and north) so in this case using Locale ENGLISH, would be enough? It also has people from Brazil which is also in South America, but I was thinking that in the future who knows if this application expands to for example China or Europe, it would be nice if it already covers those cases — BugsOverflow, Nov 11 '22 at 23:18
I wasn't suggesting hard-coding the Locale, but applying a **user-specific local**. If it's a web application, then the client's preferred locale can be inferred from the `Accept-Language` header of the request, and you can obtain it through `HttpServletRequest.getLocale()`. — Alexander Ivanchenko, Nov 11 '22 at 23:28
@AlexanderIvanchenko thank you I will try this sounds perfect for solving this little problem — BugsOverflow, Nov 12 '22 at 03:53

score 1 · Accepted Answer · answered Nov 11 '22 at 23:02

1

toLowerCase() and toUpperCase() are not always symmetric. Your examples work if you uppercase them instead:

public static boolean containsIgnoreCase(String a, String b) {
    if (a == null || b == null) {
        return false;
    }
    return a.toUpperCase().contains(b.toUpperCase());
}

answered Nov 11 '22 at 23:02

shmosel

49,289
6
73
138

Why is this answer not present in all the other discussions about this topic? Is this going to cause any unexpected behaviour? Or it will always do what we need it to do which is just evaluate as the name says it, containsignorecase :) – BugsOverflow Nov 11 '22 at 23:11
I'm not sure. I wouldn't be shocked if there are some cases that only work with `toLowerCase()`. – shmosel Nov 12 '22 at 00:40
yes well I was searching more about this but I came to the conclusion that making it work for all human languages is almost imposible without knowing the location/culture of the user, so for now I think it is okay to leave it working with lowercase or uppercase, then if you need to correct it for users in very specific places like Turkey, Germany or Greece, then is when you start demanding the user location to adjust it to that – BugsOverflow Nov 13 '22 at 03:33

How to make a Java containsignorecase that works with all human languages

1 Answers1