How to convert array of tamil unicode values into tamil string in python with whitespaces?

Question

Here is the list of Tamil unicode codepoints

[u'\u0b9a', u'\u0b9f', u'\u0bcd', u'\u0b9f', u'\u0b9a', u'\u0baa', u'\u0bc8', u'\u0baf', u'\u0bbf', u'\u0bb2', u'\u0bcd', u'\u0ba8', u'\u0bc7', u'\u0bb1', u'\u0bcd', u'\u0bb1', u'\u0bc1]

How can I convert it to readable string?

I see that you've changed your question; you now want to display your characters "with whitespaces" -- which whitespace character(s)? how many? positioned where? Try giving an example. — John Machin, Mar 17 '12 at 07:56
sir i want to the unicode of tamil to be printed as it is in arrray with whitespaces.i donot want to join the content in array and displayed in tamil characters — chandrakanth duraisamy, Mar 17 '12 at 08:36
actually i want to tokenize tamilwords.in order to tokenize it should be converted to utf-8 unicode to read the file.after reading it should be tokenized.result will be in unicode. i want result of unicode to be converted to tamil letters. but i didnt get white spaces when tokenize tamil words — chandrakanth duraisamy, Mar 17 '12 at 10:02
i need spaces for each word but not each characters when tokenizing tamil words — chandrakanth duraisamy, Mar 17 '12 at 10:04
@siva: You should really ask your REAL question the FIRST time up ... Edit your question. You will need to show your input and your tokenising code -- we are not mind-readers. — John Machin, Mar 18 '12 at 00:04

John Machin · Accepted Answer · 2012-03-17T09:44:02.267

No conversion needed.

    >>> alist = [
            u'\u0b9a', u'\u0b9f', u'\u0bcd', u'\u0b9f', u'\u0b9a',
            u'\u0baa', u'\u0bc8', u'\u0baf', u'\u0bbf', u'\u0bb2',
            u'\u0bcd', u'\u0ba8', u'\u0bc7', u'\u0bb1', u'\u0bcd',
            u'\u0bb1', u'\u0bc1',
            ]
    >>> print u''.join(alist)
    சட்டசபையில்நேற்று
    >>>

Update: Perhaps you want this:

>>> print u' '.join(alist)
ச ட ் ட ச ப ை ய ி ல ் ந ே ற ் ற ு

or this:

>>> import unicodedata
>>> for c in alist:
    print repr(c), c, unicodedata.category(c)


u'\u0b9a' ச Lo
u'\u0b9f' ட Lo
u'\u0bcd' ் Mn
u'\u0b9f' ட Lo
u'\u0b9a' ச Lo
u'\u0baa' ப Lo
u'\u0bc8' ை Mc
u'\u0baf' ய Lo
u'\u0bbf' ி Mc
u'\u0bb2' ல Lo
u'\u0bcd' ் Mn
u'\u0ba8' ந Lo
u'\u0bc7' ே Mc
u'\u0bb1' ற Lo
u'\u0bcd' ் Mn
u'\u0bb1' ற Lo
u'\u0bc1' ு Mc
>>>

Thank u sir for valuable reply – chandrakanth duraisamy Mar 17 '12 at 06:37 — chandrakanth duraisamy, Mar 17 '12 at 06:37

How to convert array of tamil unicode values into tamil string in python with whitespaces?

1 Answers1