-2

I am doing a project which involves NLP. I need to transliterate Tamil String to English string(In tanglish form) like "இல்லை" to "illai"..

How can i do that using java ?? Help me with the code sample

ti7
  • 16,375
  • 6
  • 40
  • 68
Anutharsha
  • 23
  • 7
  • You may have some luck using the [Google Translate API](http://stackoverflow.com/a/16325094/4541045) – ti7 Jan 05 '17 at 14:44
  • That really depends, are those characters directly translatable to values in the english language or does the english version change based on a set of rules? – johnny 5 Jan 05 '17 at 14:56
  • 4
    You seem to be looking for a transliteration, converting Tamil characters to the Roman alphabet. Search on 'transliteration' rather than translation, which is a different thing. – rossum Jan 05 '17 at 15:09
  • I want to convert the tamil word to thanglish word. For example, if input = அம்மா, then output should be "amma" – Anutharsha Jan 05 '17 at 19:15
  • I need code snippet – Anutharsha Jan 06 '17 at 03:46
  • @Anutharsha : Post the code you tried and ask question. This is not the place to get code. Tools are already available for this purpose. Google for it. To try on your own, give two arrays, one with tamil characters, and another with latin letters and search and replace in code to convert. There is a sample mapping at http://ccat.sas.upenn.edu/plc/tamilweb/trans/tamilunicode.html .Be careful with mappings. If you give அ, ஆ and `a`, `aa` in this order in code and replace, a text like `aam` will get converted to அஅம் instead of ஆம். So, always place நெடில் before குறில் ones in the array. – SibiCoder Jan 19 '17 at 15:26

1 Answers1

0

As there are only 72 characters in the Tamil block, build a translation table and then build a new string by testing each if character can be translated before adding it to the list.

For example U+0B87 (இ) becomes i

If you are more familiar with Java and/or have a very large amount of material to translate, there are likely a few processing optimizations to speed up the process, but I suspect the above will be the base of a good solution.

If you only have a small amount of material to translate or this is a one-off job, it may make more sense to simply use Google Translate and get the input translation below the input box.

ti7
  • 16,375
  • 6
  • 40
  • 68