2

I'm trying to transliterate roman English words to Urdu words. much like this website tool. I'm using icu4j transliterator. Output transliteration is little unexpected e.g.

input : "namaz"

output : "نَمَز"

expected output : "نماز"

English Translation : "Prayer"

Below is my id to get instance.

String id = "Eng-ur; NFD;";

Does anybody know where is problem in my id String...???

Community
  • 1
  • 1
Muzammil Husnain
  • 1,218
  • 1
  • 10
  • 24

2 Answers2

4

ICU’s rule framework doesn’t work well with source languages that have irregular pronunciation. Sadly, English is particularly hard to pronounce.

Transliteration means emulating the pronunciation of the source language in a target language. This consists of two parts: (a) Converting input to an intermediate representation that indicates the pronunciation; (b) converting the pronunciation to the final output.

With English-to-Urdu, the rule-based ICU framework will never give good results for (a), but it would very likely be a good system for doing (b). I’d recommend running your English strings through a text-to-speech system, or at least looking up the input in a very large pronunciation dictionary. This will give you pronunciations in the International Phonetic Alphabet. Once you have pronunciations, ICU should work reasonably well to generate Urdu.

Now, ICU doesn’t yet have rules for converting the International Phonetic Alphabet to Urdu. As the maintainer of Unicode’s transliteration rules, I think this should be very easy to implement; I’ll gladly do it when I find some time (but anyone is welcome to send patches!) Please file a bug at http://unicode.org/cldr/trac/newticket if you want to go this route.

2

I don't think there's a problem with your ID string per se. (Probably en-ur is sufficient though - why request NFD?) I note that the string nmạz transliterates exactly to نماز. Perhaps there is room for improvement in the transliteration rules?

Steven R. Loomis
  • 4,228
  • 28
  • 39
  • Thanks for your reply, I used NFD to remove any accent characters from the input but it's ok If I remove NFD it still gives same output, Secondly how can I improve the transliteration rules I mean I don't know what transliteration rules are and neither I know how to improve them cause I think it is manged by icu4j If I'm right. – Muzammil Husnain Nov 23 '16 at 16:39
  • NFD is not for removing accent characters, it just a decomposition. Do you mean you want to remove the vowel marks? That would be a different rule. But yes icu4j data comes from CLDR, http://cldr.unicode.org – Steven R. Loomis Nov 24 '16 at 00:45