17

Can I change default open() (io.open() in 2.7) text encoding in a cross-platform way?

So that I didn't need to specify each time open(...,encoding='utf-8').

In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.

Though documentation doesn't specify how to set preferred encoding. The function is in locale module, so I need to change locale? Is there any reliable cross-platform way to set UTF-8 locale? Will it affect anything else other than the default text file encoding?

Or locale changes are dangerous (can break something), and I should stick to custom wrapper such as:

def uopen(*args, **kwargs):
    return open(*args, encoding='UTF-8', **kwargs)
user
  • 23,260
  • 9
  • 113
  • 101

5 Answers5

20

Don't change the locale or preferred encoding because;

  • it may affect other parts of your code (or the libraries you're using); and
  • it wont be clear that your code depends on open using a specific encoding.

Instead, use a simple wrapper:

from functools import partial
open_utf8 = partial(open, encoding='UTF-8')

You can also specify defaults for all keyword arguments (should you need to).

Peter Sutton
  • 1,145
  • 7
  • 20
  • I've tried `locale.setlocale()` and it didn't change the default encoding on Windows. Even to a different non-Unicode one. So I decided to inspect CPython's source code and found out that `getpreferredencoding` [uses](https://github.com/python/cpython/blob/f7eae0adfcd4c50034281b2c69f461b43b68db84/Modules/_localemodule.c#L304) [GetACP](https://msdn.microsoft.com/en-us/library/windows/desktop/dd318070(v=vs.85).aspx) WinAPI function, it "retrieves the current Windows ANSI code page identifier". – user Nov 30 '17 at 04:14
  • There is no mechanism in Python to override this behavior except to use version-dependent hacks like the one suggested by Joran in the other answer and those found in the answers to [this question](https://stackoverflow.com/questions/31469707/changing-the-locale-preferred-encoding-in-python-3-in-windows). From what I've read, there is also no mechanism to set this encoding to UTF-8 in Windows outside of Python. Therefore, given the fact that there is no way to set this preference without resorting to hacks, I agree that changing this may be unreliable. Answer accepted. – user Nov 30 '17 at 04:14
2

you can set the encoding ... but its really hacky

import sys
sys.getdefaultencoding() #should print your default encoding
sys.setdefaultencoding("utf8") #error ... no setdefaultencoding ... but...
reload(sys)
sys.setdefaultencoding("utf8")  #now it succeeds ...

I would instead do

main_script.py

import __builtin__
old_open = open
def uopen(*args, **kwargs):
    return open(*args, encoding='UTF-8', **kwargs)
__builtin__.open = uopen

then anywhere you call open it will use the utf8 encoding ... however it may give you errors if you explicitly add an encoding

or just explicitly pass the encoding any time you open a file , or use your wrapper ...

pythons general philosophy is explicit is better than implicit, which implies the "right" solution is to explicitly declare your encoding when opening a file ...

Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • I'm not sure if it's safe to just overwrite the function in `builtins` as that also modifies the function in already imported modules and some libraries may rely on the default value. Still, it may come in handy in some cases. Thanks, +1 – user Jul 24 '14 at 04:56
1

If you really need to change the default encoding, you can replace the built-in open function.

original_open = __builtins__.open
def uopen(*args, **kwargs):
    if "b" not in (args[1] if len(args) >= 2 else kwargs.get("mode", "")):
        kwargs.setdefault("encoding", "UTF-8")
    return original_open(*args, **kwargs)
__builtins__.open = uopen

I wrote and tested this snipped after I found this mails about replacing print on a mailing list.

JojOatXGME
  • 3,023
  • 2
  • 25
  • 41
1

Maybe PEP 540 (UTF-8 Mode) is what you want:

https://peps.python.org/pep-0540/

Use -Xutf8

python.exe -Xutf8 -c "open('tmp.txt', 'w').write('天地玄黄0123'); print(open('tmp.txt').read())"

Use PYTHONUTF8 in PowerShell

$env:PYTHONUTF8=1; python.exe -c "open('tmp.txt', 'w').write('天地玄黄0123'); print(open('tmp.txt').read())"

Use PYTHONUTF8 in Cmd

set PYTHONUTF8=1&& python.exe -c "open('tmp.txt', 'w').write('天地玄黄0123'); print(open('tmp.txt').read())"

Use PYTHONUTF8 in Bash

PYTHONUTF8=1 python -c "open('tmp.txt', 'w').write('天地玄黄0123'); print(open('tmp.txt').read())"

You can also execute setx PYTHONUTF8 1 to save it as user-level environment variable.

wisbucky
  • 33,218
  • 10
  • 150
  • 101
BaiJiFeiLong
  • 3,716
  • 1
  • 30
  • 28
  • The only answer that worked for me in python 3! Although for a proper test, I think you should exclude the `encoding='utf8'` parameter to make it fail. Otherwise, it works successfully without setting the environment variable. – wisbucky Jul 08 '22 at 22:54
-1

I would not change anything in locale, as it could have a lot of side effects in other parts of your system. open is a system level function call, so its settings can have effects outside of that, or at a minimum other Python programs that use the same Python installation. Your wrapper looks appropriate, is very clean and portable, and looks to be the correct solution.

Philip Massey
  • 1,401
  • 3
  • 14
  • 24