0

A problem had been plaguing me all day where Python 3.4.1 keeps returning

UnicodeEncodeError: 'charmap' codec can't encode character '\u25be' in position 1075: character maps to undefined

Unicode shows that U+25BE is ▾ BLACK DOWN-POINTING SMALL TRIANGLE.

I have been trying to read a file that contains this little guy and no matter what I do it doesn't seem to work. Here is the relevant code:

whole = ""
f = open(src, 'r', encoding='utf-8')
        for l in f:
            whole += l
print(whole)

The print will throw the error above. I have tried encoding it to ASCII with:

l.encode('ascii', 'ignore')

and still nothing. Am I failing to decode the file wrong? If it helps, this is also a webpage, and using the urllib.request module yields the exact same result.

I'm using Windows 7 if that makes a difference.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
derigible
  • 964
  • 5
  • 15
  • 32
  • What version of Python? Works for me in 3.4.0. – Wooble Jul 31 '14 at 23:17
  • Yup, a real character: http://www.fileformat.info/info/unicode/char/25be/index.htm – Mooing Duck Jul 31 '14 at 23:17
  • I am running 3.4.1 on windows 7. – derigible Jul 31 '14 at 23:19
  • 2
    What line has the error? http://www.gossamer-threads.com/lists/python/python/1040873 implies that it's actually the `print(whole)`, since the console can't render that character. – Mooing Duck Jul 31 '14 at 23:20
  • And i am aware that it is a real character. My question is whether i am decoding it correctly or if there is a problem in some other way. – derigible Jul 31 '14 at 23:20
  • So this is an issue with the Eclipse console? I am using Eclipse to code. – derigible Jul 31 '14 at 23:21
  • related: http://stackoverflow.com/questions/14630288/unicodeencodeerror-charmap-codec-cant-encode-character-maps-to-undefined – Wooble Jul 31 '14 at 23:23
  • Ah, that seems to have pointed me in the right direction. @Wooble, if you post something about the question you posted, that is what i used. I will give you the credit. – derigible Jul 31 '14 at 23:26
  • related: [Python, Unicode, and the Windows console](http://stackoverflow.com/q/5419/4279) – jfs Aug 24 '15 at 08:30

1 Answers1

3

I assume you are printing to the Windows console. The Windows console does not default to (and has poor support for) UTF-8, but you can change the code page and try again:

C:\>chcp 65001
Active code page: 65001

C:\>py
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\u25be')
▾

>>> import unicodedata as ud
>>> ud.name('\u25be')
'BLACK DOWN-POINTING SMALL TRIANGLE'

That displays the correct character for me on US English Windows using the Consolas console font, but not the Lucida Console or Raster Fonts fonts. Make sure the font you are using supports the character.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • 1
    [the support for multibyte codepages such as 65001 in Windows console is buggy](http://stackoverflow.com/q/31846091/4279). A better alternative is to [use `win-unicode-console` package](http://stackoverflow.com/a/30551552/4279) – jfs Aug 24 '15 at 08:34