3

Suppose we have the name written in any none-latin letters - languages, like Arabic, Hebrew, Chinese, Japanese etc.

How could a search engine match between the original name and the English spelling of the same name. and vice versa?

Something like the name 拓海 in Japanese and the English spelling Takumi.

what is the algorithm/technique used to do this ?

Dmitry Zagorulkin
  • 8,370
  • 4
  • 37
  • 60
EzzatA
  • 114
  • 2
  • 10
  • 3
    Won't the search engine just [translate it](http://translate.google.com/#auto|en|%E6%8B%93%E6%B5%B7) and then search for both keywords. I am not sure I understand what your asking. – Danny Jul 05 '12 at 13:05
  • That's why i said "name" not "word". the same pronouncing written in different languages.... and cannot be translated ! – EzzatA Jul 05 '12 at 13:11
  • But if you click my link it seems that translating the name is not an issue. Google Translate can translate the name. – Danny Jul 05 '12 at 13:17
  • mmm,so do you think, they just convert the name into the equivalent pronounced characters in the other language ? – EzzatA Jul 05 '12 at 13:27
  • 1
    See [this](http://stackoverflow.com/questions/4203299/sorting-multi-locale-strings-in-java) post. There is something about multi-language sorting – rtruszk Dec 18 '14 at 09:37

2 Answers2

2

good day.

you have to do following:

classificate each lang in the world on the same symbols:

all langs:

  • Engish [26 letters] a b c d e f g ...
  • Russian [33 letters] a б в г д е ....
  • Chinese [x letters] ....
  • Ukrainian [x letters] a б в г д ..... i
  • Japanese [x letters] ...
  • .................

finally you will be have rules between any symbols spelling in any langs. Some langs, for instance, Hindi, Chinese and etc not will be have any rules. you should be create your own rules(based on transcription of this langs).

algo:

[w][e][п] = wep

e e r

e - eng r - rus transcription[п] = p

Dmitry Zagorulkin
  • 8,370
  • 4
  • 37
  • 60
0

Search engines (like Google) probably has huge amount of data sets (corpus), each corpus in different languages.

When you want to translate a word in one language to other language, it can be done by searching the word in the corpus in the first language, and return the compatible word in the corpus of the second language. (same technique for names)

That's the basic idea.

You better read about the NLP field here for some background: http://en.wikipedia.org/wiki/Natural_language_processing

barak1412
  • 972
  • 7
  • 17