Your problem is with toLowerCase. Even if utf-8 seem to solve the basic comparison problem, when it comes to making strings lower case java naturally gets confused as it doesn't know how would you like the letters to make lower case. For instance in Turkish lowercase of 'I' is 'ı' not 'i' and as such.
First of all start the application with java -Dfile.encoding=UTF-8... this is a common mistake, to run the application without utf-8 encoding
and here is my solution; I add all the desired locales and then test for each of them;
public class MultiLanguageComparator {
Set<Locale> localeList = new HashSet<Locale>();
public MultiLanguageComparator() {
localeList.add(Locale.getDefault());
localeList.add(Locale.ENGLISH);
}
public MultiLanguageComparator(String localePrefix) {
this();
Locale[] locales = Locale.getAvailableLocales();
localePrefix = localePrefix.toLowerCase(Locale.ENGLISH);
for (Locale l : locales) {
if (l.toLanguageTag().startsWith(localePrefix)) {
localeList.add(l);
}
}
}
/**
* if s1 contains s2 returns true
*
* @param s1
* @param s2
* @return
*/
public boolean contain(String s1, String s2) {
for (Locale locale : localeList) {
String tmp1 = s1.toLowerCase(locale);
String tmp2 = s2.toLowerCase(locale);
if (tmp1.contains(tmp2)) return true;
}
return false;
}
public static void main(String[] args) {
Locale[] locales = Locale.getAvailableLocales();
String s1 = ....
String s2 = ....
MultiLanguageComparator comparator = new MultiLanguageComparator("ar"); // as you want to add arabic locales, I suppose all of them or you may just add ar-sa for suudi arabia locale
System.out.println(comparator.contain(s1, s2));
}
}