converting a list of unicode character into a Hebrew string in python

Question

Following this thread solution, I have managed to get a bunch of lists that each looks like:

[u'\u05ea\u05d0\u05de\u05d9\u05df \u05dc\u05d9']

I assume that those are unicode character but for some reason, I can't convert them back into Hebrew.

I tried the suggested solution in the comments in the link. I also tried to use ''.join but it didn't work. The error I get is:

Error Type: exceptions.UnicodeEncodeError 22:42:15 T:2806414192
M:2425589760 ERROR: Error Contents: 'ascii' codec can't encode
characters in position 0-4: ordinal not in range(128)

I tried to wrap stuff in unicode() but all I got is the same as the example above.

How do I achieve that?

Note:
I am trying to parse this link.

Edit:
I am trying to convert the list into string using join and then print it. Here is the relevant pice of code:

soup = BeautifulStoneSoup(link, convertEntities=BeautifulStoneSoup.XML_ENTITIES)
    programs = soup('ul')
    for i,prog in enumerate(programs):
        if i==(4+getLetterValue(name)):
            j = 0
            while j < len(prog('li')):
                li = prog('li')[j]
        link = li('a')[0]
        url = link['href']
                text = link.contents
                print ''.join(text)

link is a string. and getLetterValue(name) returns an integer which tells what is the position in the html document.

What do you mean by "convert them back into Hebrew."? E.g. want to write them into a utf-8 encoded file? — bpgergo, Aug 29 '11 at 19:51
That already *is* a unicode string in that list, hence the `u'...`. Please elaborate what you mean by "convert them back into Hebrew". — Ross Patterson, Aug 29 '11 at 19:51
can you post some code for what you are trying to do? Assigning the list above to a variable and printing it gives תאמין לי which looks like hebrew to me... — Fredrik Pihl, Aug 29 '11 at 19:51
For me this prints fine `[u'\u05ea\u05d0\u05de\u05d9\u05df \u05dc\u05d9'] >>> print l[0] תאמין לי` — bpgergo, Aug 29 '11 at 19:51
I want to display them on the string via xbmc.org plugin. For now, the problem is with print which, in effect, print the stuff to a file and not to the screen — Yotam, Aug 29 '11 at 19:53
Please include a code sample of how you use a different string. — Ross Patterson, Aug 29 '11 at 19:54
That's not a code sample of how you'd use a different string to do what you want to do. IOW, how would you normally put a string on the screen that isn't working with this string? — Ross Patterson, Aug 29 '11 at 20:00
@Rossa Patterson: I'm not sure what you meant. The solution you wrote me doesn't work. This could be originated in the way that xbmc handles string. — Yotam, Aug 29 '11 at 20:09

score 3 · Accepted Answer · edited May 23 '17 at 12:12

This is a unicode string, it is in Hebrew and you can even print it directly on a Python interactive shell. e.g.:

>>> print u'\u05ea\u05d0\u05de\u05d9\u05df \u05dc\u05d9'
תאמין לי

If you really need to convert it to a raw string of bytes (a str object) for some reason, you have to specify the encoding of the byte string because text can represented in many different encodings.

Short answer: assuming you want to use UTF-8 to encode the text, you can use:

your_unicode_text.encode('utf-8')

If you are going to use a different encoding, just change the encoding name above.

For a reference on how Python deals with Unicode text and common problems, see: http://docs.python.org/howto/unicode.html

See also this answer for another short explanation of Unicode and string encodings.

converting a list of unicode character into a Hebrew string in python

1 Answers1