3

Obligatory intro noting that I've done some research

This seems like it should be straightforward (I am happy to close as a duplicate if a suitable target question is found), but I'm not familiar enough with character encodings and how Python handles them to suss it out myself. At risk of seeming lazy, I will note the answer very well may be in one of the links below, but I haven't yet seen it in my reading.

I've referenced some of the docs: Unicode HOWTO, codecs.py docs

I've also looked at some old highly-voted SO questions: Writing Unicode text to a text file?, Python, Unicode, and the Windows console


Question

Here's a MCVE code example that demonstrates my problem:

with open('foo.txt', 'wt') as outfile:
    outfile.write('\u014d')

The traceback is as follows:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Users\cashamerica\AppData\Local\Programs\Python\Python3\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u014d' in position 0: character maps to <undefined>

I'm confused because the code point U+014D is 'ō', an assigned code point, LATIN SMALL LETTER O WITH MACRON (official Unicode source)

I can even print the the character to the Windows console (but it renders as a normal 'o'):

>>> print('\u014d')
o
Graham
  • 3,153
  • 3
  • 16
  • 31

1 Answers1

4

You are using cp1252 as the default encoding, which does not include ō.

Write (and read) your file with explicit encoding:

with open('foo.txt', 'wt', encoding='utf8') as outfile:
    outfile.write('\u014d')
Daniel
  • 42,087
  • 4
  • 55
  • 81
  • 2
    I did not realize `open` encoded to `cp1252`; that's really kind of a rough edge. I guess I should have read the [docs for `open`](https://docs.python.org/3/library/functions.html#open), which state: "In text mode, if *encoding* is not specified the encoding used is platform dependent: `locale.getpreferredencoding(False)` is called to get the current locale encoding." – Graham Apr 07 '19 at 12:17