1

I'm trying to pass letters from another language in Python through the program like this:

theWord = "阿麗思道"
theWord = theWord.decode('unicode-escape')
print theWord

I keep getting the following error:

UnicodeEncodeError: 'charmap' codec can't encode character u'\x98' in position 1: character maps to <undefined>

It's something with setting the right unicode, but I can't find anything on it. Anyone know?

I need to get the characters to pass through exactly because I'm trying to pass them through a chinese translation program so I am trying to get the translation out.

king
  • 1,304
  • 3
  • 23
  • 43
  • 1
    Are you sure you're using the correct encoding for your file? You may take a look at https://www.python.org/dev/peps/pep-0263/ – Guillaume Feb 23 '16 at 18:11
  • no i'm not.. i know it's wrong, i don't know what else to put there. i tried utf8 as well, it doens't work – king Feb 23 '16 at 18:12
  • If you're on some Unix-like system, just type `file your_file.py` to know its encoding. If you're using windows, take a look at this answer: http://stackoverflow.com/a/13464816/1486118 – Guillaume Feb 23 '16 at 18:40
  • i don't know the encoding to use, i'm trying to send characters of another langauge through and make them look like normal characters, not some random gibberish, or something even ,it just gives an error – king Feb 23 '16 at 18:44
  • have you try with `# -*- coding: utf-8 -*-` at the star of the file? that work for me – Copperfield Feb 24 '16 at 18:37
  • @Copperfield yes I do have that at the top of my file. you are able to see the chinese characters? i'm not sure why i can't. – king Feb 24 '16 at 18:49
  • whne i try to just print it out, it shows this: 阿麗æ€�é�“ , i just want the actual characters to go through – king Feb 24 '16 at 18:49

3 Answers3

1

something like this? (taken from how to print chinese word in my code.. using python, Python - 'ascii' codec can't decode byte)

# coding = utf-8
theWord = "阿麗思道"
theWord = theWord.decode('utf-8').encode('utf-8')
print theWord
Community
  • 1
  • 1
  • UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-3: character maps to – king Feb 23 '16 at 21:14
  • Try encoding back to utf-8, see updated edit. Or if you want to stay with `'unicode-escape'`, simply change `encode('utf-8')` to `encode('unicode-escape')` – chickity china chinese chicken Feb 23 '16 at 23:48
  • getting closer... the output I got now is: 阿麗æ€�é�“ ....is there a way to get the actual characters? i'm trying to send the characters through a translation api – king Feb 24 '16 at 14:03
  • with unicode escape i get... \u963f\u9e97\u601d\u9053 – king Feb 24 '16 at 14:04
1

I think the problem is in the decode you are using, check this

# -*- coding: utf-8 -*-

chinase = "阿麗思道"
print "original:", chinase
print "repr:", repr(chinase)
print
x = chinase.decode('unicode-escape')
print 'unicode-escape:', x
print "repr:",repr(x)
print
y = chinase.decode('utf-8')
print 'utf-8', y
print "repr",repr(y)

when I run it I get

original: 阿麗思道
repr: '\xe9\x98\xbf\xe9\xba\x97\xe6\x80\x9d\xe9\x81\x93'

unicode-escape: é¿éºæé
repr: u'\xe9\x98\xbf\xe9\xba\x97\xe6\x80\x9d\xe9\x81\x93'

utf-8 阿麗思道
repr u'\u963f\u9e97\u601d\u9053'

so just use decode('utf-8') and it should be fine

edit

interesting enough, if I run it in cmd in windows I get the output and the same error as you do, with that I conclude that the problem is in the place where you want to run it, as the cmd only support ascii characters anything else that you try to display in it will be imposible because it will try to transform it to the encoding of that device but it fail in the process, so you have to change to a editor that have a proper support of unicode like the IDLE that come with python or work without any prints

Copperfield
  • 8,131
  • 3
  • 23
  • 29
0

Check Your console encoding which might not be UTF-8 and this could be the reason for the characters to not print on your console. If you write the output into UTF-8 encoded file then this will work.

theWord = "阿麗思道"
fp=open("out.txt","wb")
theWord = fp.write(bytes(theWord.encode('utf-8')))
fp.close()