I have hebrew data such that \xe0 is the hebrew aleph, and wish to convert it into utf-8
Asked
Active
Viewed 2,741 times
3
-
You might also want to take a look at [that](http://stackoverflow.com/questions/368805/python-unicodedecodeerror-am-i-misunderstanding-encode/370199#370199) answer. Also, note that your strings are most probably encoded as `cp1255` (see [here](http://en.wikipedia.org/wiki/Windows-1255) ), not `iso8859-8`. – tzot Mar 06 '11 at 21:59
2 Answers
7
In general in Python, if you have a byte string you need to use decode first to convert it to the internal representation, afterwards you can encode it to UTF-8. Of course, you need to know the coding of \xe0
for this to work (I assume your character is encoded using ISO-8859-8):
'\xe0'.decode('iso-8859-8').encode('utf-8')
EDIT: A side note:
Make sure to use the internal representation in your program as long as possible. In general: decode first (on input), encode last (on output).

paprika
- 2,424
- 26
- 46
0
you can use the "decode" call to transform it in unicode
y = x.decode('iso8859-8')
where x
is your 8-bit string and y
is the unicode string
then you can convert it to utf-8 using the encode
call
z = y.encode('utf-8')

6502
- 112,025
- 15
- 165
- 265