Hebrew unicode in Python

Question

I'm trying to make an English-Hebrew Dictionary.
I have a dictionary in Tab format (<word>TAB<translation>). In the end - I want it to be in mobi format. I've found Python script that convert from tab to opf (and htmls). From there it's easy to convert to mobi. The Python script called tab2opf.py.

When I'm using the original file with my tab(.txt) file - everything is fine.
I'm using the script with the built-in utf option: tab2opf.py -utf tab.txt

The problem is that I want the dictionary for my Kindle. The Kindle shows the Hebrew translation backward. So I decided to edit the tab2opf file so he would reverse the translation - and in the kindle it will be shown correctly.

I wrote the following code:

def RevIt(s):
heb = []
g = ""
for i in range(len(s)):
    c = s[i]
    heb.append(c)
for i in range(len(heb)):
g += heb.pop()
return g

and in the tab2opf.py I added after line 245 dd = RevIt(dd).
Now I recieve mess:
"-բ לימלՠ£ילחתכՠ©משמהՠתՠאՠמיסՠ,)¨וביחՠמיסՠ:תכבը נסרפמאՠ,& .צעՠשՠ
For comparsion, this is how the same line in the original txt file looks like:
שם עצם. &, אמפרסנד (בכתב: סימן חיבור), סימן או תו המשמש כתחליף למילה "ו-"

What am I doing wrong?

this may help some ... http://stackoverflow.com/questions/3379589/strings-in-hebrew-in-python-for-s60 — Joran Beasley, Oct 15 '12 at 14:46

score 5 · Accepted Answer · answered Oct 15 '12 at 14:42

5

You're working with bytes instead of Unicode characters. Try this:

g = u""
s = s.decode('UTF-8')

answered Oct 15 '12 at 14:42

Mark Ransom

299,747
42
398
622

@JoranBeasley, yes I meant `s`. I'm assuming that the input string is raw bytes and needs to be decoded into Unicode characters. You could just as easily reverse the two lines. – Mark Ransom Oct 15 '12 at 14:47
Now I get an error `""" % (dt, dtstrip, dd)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 80: ordinal not in range(128) ` The txt file is already utf-8, BTW. – HaReL Oct 15 '12 at 15:55
@HaReL, the rest of your code is probably expecting bytes again. Try returning `g.encode('UTF-8')`. Your code would probably be more robust if you were working in Unicode throughout. – Mark Ransom Oct 15 '12 at 15:58

Hebrew unicode in Python

1 Answers1