2

i am building an application that supports arabic and english languages.

i have a list and i want the user to enter a string so i can found if his string is existed in the list.

i use this:

String userstring = bla bla bla;

for (int i = 0; i < allFoods.size(); i++) {
    if (allFoods.get(i).toLowerCase().contains(userstring.toLowerCase()))
                    //do something here
            }

that code works perfectly when the words that user enter is in english. but i got no results when the user enter an arabic string

what am i doing wrong please? and what should i do ?

thank u.

edit: i don't want to sort and compare strings, but i want to check the equality (contain)

mavrosxristoforos
  • 3,573
  • 2
  • 25
  • 40
Marco Dinatsoli
  • 10,322
  • 37
  • 139
  • 253

4 Answers4

3

If you want to do string comparison, you can use the Collator API:

List<String> list = ...;

// create collator for arabic
Collator collator = Collator.getInstance(new Locale("ar"));
collator.setDecomposition(Collator.FULL_DECOMPOSITION);
collator.setStrength(Collator.SECONDARY); // ignores lower/upper case

// sort list
Collections.sort(list, collator);
// or use it as any other comparator

I don't know if this API can somehow be used to test if a String is contained in another.

isnot2bad
  • 24,105
  • 2
  • 29
  • 50
1

Your problem is with toLowerCase. Even if utf-8 seem to solve the basic comparison problem, when it comes to making strings lower case java naturally gets confused as it doesn't know how would you like the letters to make lower case. For instance in Turkish lowercase of 'I' is 'ı' not 'i' and as such.

First of all start the application with java -Dfile.encoding=UTF-8... this is a common mistake, to run the application without utf-8 encoding

and here is my solution; I add all the desired locales and then test for each of them;

public class MultiLanguageComparator {


    Set<Locale> localeList = new HashSet<Locale>();

    public MultiLanguageComparator() {
        localeList.add(Locale.getDefault());
        localeList.add(Locale.ENGLISH);
    }

    public MultiLanguageComparator(String localePrefix) {
        this();
        Locale[] locales = Locale.getAvailableLocales();
        localePrefix = localePrefix.toLowerCase(Locale.ENGLISH);
        for (Locale l : locales) {
            if (l.toLanguageTag().startsWith(localePrefix)) {
                localeList.add(l);
            }
        }
    }

    /**
     * if s1 contains s2 returns true
     *
     * @param s1
     * @param s2
     * @return
     */
    public boolean contain(String s1, String s2) {
        for (Locale locale : localeList) {
            String tmp1 = s1.toLowerCase(locale);
            String tmp2 = s2.toLowerCase(locale);
            if (tmp1.contains(tmp2)) return true;
        }
        return false;
    }

    public static void main(String[] args) {

        Locale[] locales = Locale.getAvailableLocales();

        String s1 = ....
        String s2 = ....
        MultiLanguageComparator comparator = new MultiLanguageComparator("ar"); // as you want to add arabic locales, I suppose all of them or you may just add ar-sa for suudi arabia locale
        System.out.println(comparator.contain(s1, s2));

    }
}
hevi
  • 2,432
  • 1
  • 32
  • 51
0

I had problem comparing German strings with umlaut. I used Unicode-Escapes and it solved my problem. You can find the list here.

I used the Unicode-Escapes directly in the string.

String mystring = "GERÄT";
mystring.equals("GER\u00C4T");
0xM4x
  • 460
  • 1
  • 8
  • 19
-1

Convert your strings charset to ISO-8859-6 (Arabic) before comparing:

Converting charset in java:

Charset utf8charset = Charset.forName("UTF-8");
Charset iso88596charset = Charset.forName("ISO-8859-6");

ByteBuffer inputBuffer = ByteBuffer.wrap(new byte[]{(byte)0xC3, (byte)0xA2});

// decode UTF-8
CharBuffer data = utf8charset.decode(inputBuffer);

// encode ISO-8859-6
ByteBuffer outputBuffer = iso88596charset.encode(data);
byte[] outputData = outputBuffer.array();

Code taken here.

Community
  • 1
  • 1
Russell Gutierrez
  • 1,372
  • 8
  • 19
  • first of all, thank you for your answer. secondly, please don't just copy codes from another questions. thirdly, where is the string that i should convert. fourthly, where is the string result? – Marco Dinatsoli Dec 05 '13 at 07:48