8

[Using Python 3.2]

If I don't provide encoding argument to open, the file is opened using locale.getpreferredencoding(). So for example, on my Windows machine, any time I use open('abc.txt'), it would be decoded using cp1252.

I would like to switch all my input files to utf-8. Obviously, I can add encoding = 'utf-8' to all my open function calls. Or, better, encoding = MY_PROJECT_DEFAULT_ENCODING, where the constant is defined at the global level somewhere.

But I was wondering if there is a clean way to avoid editing all my open calls, by changing the "default" encoding. Is it something I can change by changing the locale? Or by changing a parameter inside the locale? I tried to follow the Python manual but failed to understand how this is supposed to be used.

Thanks!

max
  • 49,282
  • 56
  • 208
  • 355
  • Before I found this question, I've asked a differently worded duplicate and received some answers. You may find them useful: [Is there a way to change Python's open() default text encoding?](http://stackoverflow.com/q/24897644/3075942) – user Jul 24 '14 at 07:19
  • There're some more answers in the similar question: [Changing the “locale preferred encoding” in Python 3 in Windows](http://stackoverflow.com/questions/31469707/changing-the-locale-preferred-encoding-in-python-3-in-windows) – Antony Hatchkins Dec 17 '15 at 22:17
  • See https://stackoverflow.com/a/61570285/7796217 for a solution which I got working on Python 3.7. – Peter Fogh Jun 12 '20 at 08:46
  • See https://stackoverflow.com/a/61570285/7796217 for a solution I got working on Python 3.7 – Peter Fogh Jun 12 '20 at 08:47

1 Answers1

3

In Windows, with Python 3.3+, execute chcp 65001 in the console or a batch file before running Python in order to change the locale encoding to UTF-8.

Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • Thanks. It does not work for Python 3.2 unless I `set PYTHONIOENCODING=UTF-8`, but it seems to work for Python 3.3. Let me try it some more, and I'll update this comment. – max Jul 17 '12 at 07:30
  • 1
    This accepted answer doesn't work for me at all. I tried both `chcp 65001` and `set PYTHONIOENCODING=UTF-8`. Please see my "answer" below, which should really be here in this comment location. Thanks. – walrus Jul 17 '15 at 01:47
  • Codepage 65001 is broken for use in the console. There are multiple bugs related to the console host process, conhost.exe, and C runtime. It's getting better with each version, but even Windows 10 with VC++ 14 isn't bug free yet. Also, this only affects `_Py_device_encoding` for stdin, stdout, and stderr. `locale.getpreferredencoding()` returns the Windows ANSI encoding -- as it should. – Eryk Sun Jul 17 '15 at 14:00
  • 2
    @eryksun Thanks for the info, but it has too much Windows-specific jargon for me. I rarely use Windows. All I want is a way to say to either Windows 8 or to Python 3: "Dear Windows 8 / Python 3, Please be informed that all the text files on this computer should be encoded in UTF-8 without exception. Please remember this fact in the future when opening text files. Thanks." – walrus Jul 18 '15 at 01:10
  • `chcp` is not necessary and does not solve the problem - it just controls your terminal's settings and has nothing to do with IO or opening files. PYTHONIOENCODING is a better way to go but this does not solve the problem in this case either - it just controls stdio (piping from/to stdin/stdout). – ejm Aug 07 '19 at 12:29