how to convert 8-bit hebrew to utf-8 in python

Question

I have hebrew data such that \xe0 is the hebrew aleph, and wish to convert it into utf-8

You might also want to take a look at [that](http://stackoverflow.com/questions/368805/python-unicodedecodeerror-am-i-misunderstanding-encode/370199#370199) answer. Also, note that your strings are most probably encoded as `cp1255` (see [here](http://en.wikipedia.org/wiki/Windows-1255) ), not `iso8859-8`. — tzot, Mar 06 '11 at 21:59

paprika · Accepted Answer · 2011-02-13T07:29:56.373

7

In general in Python, if you have a byte string you need to use decode first to convert it to the internal representation, afterwards you can encode it to UTF-8. Of course, you need to know the coding of \xe0 for this to work (I assume your character is encoded using ISO-8859-8):

'\xe0'.decode('iso-8859-8').encode('utf-8')

EDIT: A side note:

Make sure to use the internal representation in your program as long as possible. In general: decode first (on input), encode last (on output).

edited Feb 13 '11 at 07:29

answered Feb 13 '11 at 07:24

paprika

2,424
26
46

1

Encoding is probably iso-8859-8, if not Windows-1255. – Mikel Feb 13 '11 at 07:30

score 0 · Answer 2 · answered Feb 13 '11 at 07:27

0

you can use the "decode" call to transform it in unicode

y = x.decode('iso8859-8')

where x is your 8-bit string and y is the unicode string then you can convert it to utf-8 using the encode call

z = y.encode('utf-8')

answered Feb 13 '11 at 07:27

6502

112,025
15
165
265

how to convert 8-bit hebrew to utf-8 in python

2 Answers2