3

I have hebrew data such that \xe0 is the hebrew aleph, and wish to convert it into utf-8

  • You might also want to take a look at [that](http://stackoverflow.com/questions/368805/python-unicodedecodeerror-am-i-misunderstanding-encode/370199#370199) answer. Also, note that your strings are most probably encoded as `cp1255` (see [here](http://en.wikipedia.org/wiki/Windows-1255) ), not `iso8859-8`. – tzot Mar 06 '11 at 21:59

2 Answers2

7

In general in Python, if you have a byte string you need to use decode first to convert it to the internal representation, afterwards you can encode it to UTF-8. Of course, you need to know the coding of \xe0 for this to work (I assume your character is encoded using ISO-8859-8):

'\xe0'.decode('iso-8859-8').encode('utf-8')

EDIT: A side note:

Make sure to use the internal representation in your program as long as possible. In general: decode first (on input), encode last (on output).

paprika
  • 2,424
  • 26
  • 46
0

you can use the "decode" call to transform it in unicode

y = x.decode('iso8859-8')

where x is your 8-bit string and y is the unicode string then you can convert it to utf-8 using the encode call

z = y.encode('utf-8')
6502
  • 112,025
  • 15
  • 165
  • 265