I'm aware of possible duplicates How does the Google "Did you mean?" Algorithm work? However, my problem is not related to Translation, Transliteration, or spelling corrector.
I am developing an application in Java that maps the English keyboard keys to the same corresponding Arabic key.
The best working example is Google's "Did you mean" algorithm:
- Write in Arabic "صاشف هس فاهس؟"
- Google will give the result: "Did you mean: what is this?"
An example is given here.
I have done this solution manually by making Enum with a map as follows:
package com.something.commands.text.model.enums;
import lombok.Getter;
import java.util.HashMap;
@Getter
public enum EKeyAlphabet {
Q("q", "ض"),
W("w", "ص"),
E("e", "ث"),
R("r", "ق"),
T("t", "ف"),
Y("y", "غ"),
U("u", "ع"),
I("i", "ه"),
O("o","خ"),
P("p", "ح"),
LEFT_CURLY("{", "ج"),
RIGHT_CURLY("}", "د"),
A("a", "ش"),
S("s", "س"),
D("d", "ي"),
F("f", "ب"),
G("g", "ل"),
H("h", "ا"),
J("j", "ت"),
K("k", "ن"),
L("l", "م"),
SEMICOLON(";", "ك"),
COLON("'", "ط"),
Z("z", "ئ"),
X("x", "ء"),
C("c", "ؤ"),
V("v", "ر"),
B("b", "لا"),
N("n", "ى"),
M("m", "ة"),
COMMA(",", "و"),
DOT(".", "ز"),
SLASH("//", "ظ"),
TICK("`", "ذ")
;
private String enAlpha;
private String arAlpha;
private static HashMap<String, String> map;
EKeyAlphabet(String enAlpha, String arAlpha){
this.enAlpha = enAlpha;
this.arAlpha = arAlpha;
}
public static String getEnAlpha(String arAlpha){
if(map == null){
initializeMap();
}
if(map.containsKey(arAlpha)){
return map.get(arAlpha);
}
return null;
}
private static void initializeMap(){
map = new HashMap<>();
for (EKeyAlphabet alphabet: EKeyAlphabet.values()){
map.put(alphabet.getArAlpha(), alphabet.getEnAlpha());
}
}
}
However, this solution has drawbacks:
- There are some possibilities of ambiguous mapping between characters (i.e.: "لا" can be produced by pressing "g" then "h" keyboard keys in Arabic Windows language, or can be produced directly by pressing "b" ONLY).
- Mapping special characters like "إ", " ً", " ِ", etc. is overwhelming.
So, in conclusion:
- Is there any third-party library or API that does that?
- Is it possible to accomplish this solution by converting the Unicode?
- Is it possible to do it in most languages (i.e.: not only English -> Arabic).