2

i'm trying to print : Pokémon GO Việt Nam

print u"Pokémon GO Việt Nam"

and i'm getting :

print u"PokÚmon GO Vi?t Nam"
SyntaxError: (unicode error) 'utf8' codec can't decode byte 0xe9 in position 0: unexpected end of data

i've tried :

.encode("utf-8")
.decode("utf-8")
.decode('latin-1').encode("utf-8")
unicode(str.decode("iso-8859-4"))

My python version is 2.7.9 , Notepad++ UTF-8 encoding . with no luck , how can i print it ? and i'm encountering this kind of issues all the time , what's the proper way to debug and get the right encoding ?

3 Answers3

4
#!/usr/bin/python
# -*- coding: utf-8 -*-

print "Pokémon GO Việt Nam"

You can find here more info

For PyCharm settings, go to the menu: PyCharm --> Preference then use the search to look up "encoding", you should reach the following screen:

enter image description here

Nir Alfasi
  • 53,191
  • 11
  • 86
  • 129
  • yeah try that also , it work if we just print `u"pokémon"` but not `"Pokémon GO Việt Nam"` – Brenda Martinez Aug 31 '16 at 21:37
  • @BrendaMartinez: what encoding is your editor using? – Andrea Corbellini Aug 31 '16 at 21:38
  • @BrendaMartinez make sure that both your IDE encoding as well as the project encoding are set to 'utf-8' – Nir Alfasi Aug 31 '16 at 21:39
  • Utf-8 , notepad++ – Brenda Martinez Aug 31 '16 at 21:39
  • @BrendaMartinez well notepad++ is another problem :) use a proper IDE such as PyCharm: you'll get all the benefits of IDE such as debugging and inspection capabilities as well as many other goodies. – Nir Alfasi Aug 31 '16 at 21:41
  • yeah for that i've tested sublimetext and i'm getting `UnicodeEncodeError: 'charmap' codec can't encode character u'\u1ec7' in position 13: character maps to ` if i just print "Pokémon" all good , "Việt" what causing the problem . – Brenda Martinez Aug 31 '16 at 21:44
  • @BrendaMartinez sublime is NOT an IDE - it's the same as Notepad++... You can check the default encoding settings of sublime by going to the menu: sublime text --> Preferences --> Settings - Default and then search for: `default_encoding` – Nir Alfasi Aug 31 '16 at 21:47
  • i've just installed pycharm `UnicodeEncodeError: 'charmap' codec can't encode character u'\u1ec7' in position 13: character maps to ` , and i'm using encoding declaration , Pycharm5 – Brenda Martinez Aug 31 '16 at 21:51
  • @BrendaMartinez I've added a screenshot of PyCharm preferences - compare it to yours. – Nir Alfasi Aug 31 '16 at 22:04
  • yeah exactly the same , i'm just trying `#!/usr/bin/python # -*- coding: utf8 -*- print u"ệ"` with no luck , the code works for you ? – Brenda Martinez Aug 31 '16 at 22:10
  • Yes, you can see in the screenshot that it does. Maybe there's some encoding issue with your OS ? can you try a different computer in order to eliminate potential causes ? – Nir Alfasi Aug 31 '16 at 22:11
  • thanks for your effort , that was correct , i've test it in my VPS (ubuntu) and worked , Windows is the problem , i'm getting this kind of errors all the time :/ – Brenda Martinez Aug 31 '16 at 22:15
  • Sorry I can't help you with Windows: my Win-machine is back at home and I'm at work... Anyways, glad you pinned the issue to the OS. The following might help: http://stackoverflow.com/questions/6344853/python-unicode-in-windows-terminal-encoding-used – Nir Alfasi Aug 31 '16 at 22:18
1

Specify the encoding

#!/usr/bin/python
# -*- coding: utf-8 -*-

in the top of the program

rafaelc
  • 57,686
  • 15
  • 58
  • 82
0

As an alternative you can encode the unicode string:

print u"Pokémon GO Việt Nam".encode('utf-8')

The advantage is that the bytes in the resulting string are independent of the encoding of the source file: u"ệ".encode('utf-8') is always the same 3 bytes "\xe1\xbb\x87".

It is also consistent with what you'd do if you have an unicode string in a variable.

# get text from somewhere...
text = u"Pokémon GO Việt Nam"

# assuming your terminal expects UTF-8 -- this won't work on Windows.
print text.encode('utf-8')
roeland
  • 5,349
  • 2
  • 14
  • 28