7

Here is the list of Tamil unicode codepoints

[u'\u0b9a', u'\u0b9f', u'\u0bcd', u'\u0b9f', u'\u0b9a', u'\u0baa', u'\u0bc8', u'\u0baf', u'\u0bbf', u'\u0bb2', u'\u0bcd', u'\u0ba8', u'\u0bc7', u'\u0bb1', u'\u0bcd', u'\u0bb1', u'\u0bc1]

How can I convert it to readable string?

Ashwin Balamohan
  • 3,303
  • 2
  • 25
  • 47
  • 1
    Those are already Tamil letters. Try again. – Ignacio Vazquez-Abrams Mar 17 '12 at 05:46
  • I see that you've changed your question; you now want to display your characters "with whitespaces" -- which whitespace character(s)? how many? positioned where? Try giving an example. – John Machin Mar 17 '12 at 07:56
  • sir i want to the unicode of tamil to be printed as it is in arrray with whitespaces.i donot want to join the content in array and displayed in tamil characters – chandrakanth duraisamy Mar 17 '12 at 08:36
  • actually i want to tokenize tamilwords.in order to tokenize it should be converted to utf-8 unicode to read the file.after reading it should be tokenized.result will be in unicode. i want result of unicode to be converted to tamil letters. but i didnt get white spaces when tokenize tamil words – chandrakanth duraisamy Mar 17 '12 at 10:02
  • i need spaces for each word but not each characters when tokenizing tamil words – chandrakanth duraisamy Mar 17 '12 at 10:04
  • @siva: You should really ask your REAL question the FIRST time up ... Edit your question. You will need to show your input and your tokenising code -- we are not mind-readers. – John Machin Mar 18 '12 at 00:04

1 Answers1

10

No conversion needed.

    >>> alist = [
            u'\u0b9a', u'\u0b9f', u'\u0bcd', u'\u0b9f', u'\u0b9a',
            u'\u0baa', u'\u0bc8', u'\u0baf', u'\u0bbf', u'\u0bb2',
            u'\u0bcd', u'\u0ba8', u'\u0bc7', u'\u0bb1', u'\u0bcd',
            u'\u0bb1', u'\u0bc1',
            ]
    >>> print u''.join(alist)
    சட்டசபையில்நேற்று
    >>> 

Update: Perhaps you want this:

>>> print u' '.join(alist)
ச ட ் ட ச ப ை ய ி ல ் ந ே ற ் ற ு

or this:

>>> import unicodedata
>>> for c in alist:
    print repr(c), c, unicodedata.category(c)


u'\u0b9a' ச Lo
u'\u0b9f' ட Lo
u'\u0bcd' ் Mn
u'\u0b9f' ட Lo
u'\u0b9a' ச Lo
u'\u0baa' ப Lo
u'\u0bc8' ை Mc
u'\u0baf' ய Lo
u'\u0bbf' ி Mc
u'\u0bb2' ல Lo
u'\u0bcd' ் Mn
u'\u0ba8' ந Lo
u'\u0bc7' ே Mc
u'\u0bb1' ற Lo
u'\u0bcd' ் Mn
u'\u0bb1' ற Lo
u'\u0bc1' ு Mc
>>> 
John Machin
  • 81,303
  • 11
  • 141
  • 189