I am trying to do the same thing in python as the java code below.
String decoded = new String("ä¸".getBytes("ISO8859_1"), "UTF-8");
System.out.println(decoded);
The output is a Chinese String "中".
In Python I tried the encode/decode/bytearray thing but I always got unreadable string. I think my problem is that I don't really understand how the java/python encoding mechanism works. Also I cannot find a solution from the existing answers.
#coding=utf-8
def p(s):
print s + ' -- ' + str(type(s))
ch1 = 'ä¸-'
p(ch1)
chu1 = ch1.decode('ISO8859_1')
p(chu1.encode('utf-8'))
utf_8 = bytearray(chu1, 'utf-8')
p(utf_8)
p(utf_8.decode('utf-8').encode('utf-8'))
#utfstr = utf_8.decode('utf-8').decode('utf-8')
#p(utfstr)
p(ch1.decode('iso-8859-1').encode('utf8'))
ä¸- -- <type 'str'>
ä¸Â- -- <type 'str'>
ä¸Â- -- <type 'bytearray'>
ä¸Â- -- <type 'str'>
ä¸Â- -- <type 'str'>
Daniel Roseman's answer is really close. Thank you. But when it comes to my real case:
ch = 'masanori harigae ã\201®ã\203\221ã\203¼ã\202½ã\203\212ã\203«ä¼\232è-°å®¤'
print ch.decode('utf-8').encode('iso-8859-1')
I got
Traceback (most recent call last): File "", line 1, in File "/apps/Python/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0x81 in position 19: invalid start byte
Java code:
String decoded = new String("masanori harigae ã\201®ã\203\221ã\203¼ã\202½ã\203\212ã\203«ä¼\232è-°å®¤".getBytes("ISO8859_1"), "UTF-8");
System.out.println(decoded);
The output is masanori harigae のパーソナル会�-�室