10
>>> import sys
>>> sys.getfilesystemencoding()
'UTF-8'

How do I change that? I know how to change the default system encoding.

>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding('ascii')

But there is no sys.setfilesystemencoding.

wim
  • 338,267
  • 99
  • 616
  • 750
arjunaskykok
  • 946
  • 10
  • 17
  • Note that there was [`sys.setfilesystemencoding`](https://bugs.python.org/issue3187#msg74080) function and also env var [`PYTHONFSENCODING`](https://bugs.python.org/issue8622) in early versions of Python 3.x. They were problematic and got removed, now Python uses locale encoding as the filesystem encoding. See [_Painful History of the Filesystem Encoding_](https://vstinner.github.io/painful-history-python-filesystem-encoding.html) from Victor Stinner's blog. – wim Feb 14 '22 at 02:59

2 Answers2

15

There are two ways to change it:

  1. (linux-only) export LC_CTYPE=en_US.UTF-8 before launching python:
$ LC_CTYPE=C python -c 'import sys; print(sys.getfilesystemencoding())'
ANSI_X3.4-1968
$ LC_CTYPE=en_US.UTF-8 python -c 'import sys; print(sys.getfilesystemencoding())'
UTF-8

Note that LANG serves as the default value for LC_CTYPE if it is not set, while LC_ALL overrides both LC_CTYPE and LANG)

  1. monkeypatching:
import sys
sys.getfilesystemencoding = lambda: 'UTF-8'

Both methods let functions like os.stat accept unicode (python2.x) strings. Otherwise those functions raise an exception when they see non-ascii symbols in the filename.

Update: In the (1) variant the locale has to be available (present in locale -a) for this setting to have the desired effect.

Antony Hatchkins
  • 31,947
  • 10
  • 111
  • 111
  • @sureshvv What is your OS? – Antony Hatchkins Feb 13 '17 at 04:45
  • Ubuntu 16.04. Had to add LANG=en_US.UTF8 to /etc/environment and reboot. – sureshvv Feb 20 '17 at 09:49
  • @sureshvv reboot is definitely an overkill in this situation, but I'm glad that you've resolved the issue anyway. Did you launch python directly from command line or as a system service? – Antony Hatchkins Feb 20 '17 at 10:07
  • Only from the command line. The change I made did not become effective until reboot. – sureshvv Feb 20 '17 at 10:55
  • @sureshvv It's not surprising about `/etc/environment` but `export LANG=en_US.UTF8` has immediate effect – Antony Hatchkins Feb 20 '17 at 14:16
  • Would setting LC_ALL interfere with LANG? To be safe, I set both. May be they cancelled out. – sureshvv Feb 21 '17 at 09:16
  • @sureshvv The specific env var that is responsible for `getfilesystemencoding` is LC_CTYPE. If it is not set, LANG is used as the default. Finally, if LC_ALL is set, it overrides both LC_CTYPE and LANG. – Antony Hatchkins Feb 21 '17 at 10:21
  • @sureshvv See also [this answer](http://stackoverflow.com/questions/28522990/where-py-filesystemdefaultencoding-is-set-in-python-source-code) – Antony Hatchkins Feb 21 '17 at 10:29
  • @wim UTF8 and UTF-8 are synonyms. Internally it will be UTF-8 anyway. en_US and C have subtle differences. For example, LC_ALL=C results in DD/MM/YYYY date format and en_US gives MM/DD/YYYY. Both C.UTF-8 and en_US.UTF-8 will set `getfilesystemencoding()` correctly. Nowadays it is utf-8 out of the box (debian and ubuntu). Are you dealing with an old pc? – Antony Hatchkins Feb 12 '22 at 15:49
  • @wim fixed now. – Antony Hatchkins Feb 13 '22 at 12:57
  • Something else that might be worth mentioning: the locale has to be available (present in `locale -a`) for this setting to have the desired effect. If the necessary langpack are not installed, then even when you've set `LC_CTYPE=C.utf-8` you can still have `sys.getfilesystemencoding()` returning ascii. – wim Feb 13 '22 at 13:17
  • @AntonyHatchkins I think you should remove method 2 from the answer. _It does not work_. Python uses the locale encoding as the filesystem encoding, and monkeypatching this function after the interpreter has already started will not change anything that matters. I don't believe this would have worked even back in python 2.7 either, tbh - `os.stat` does not call `sys.getfilesystemencoding()`. – wim Feb 14 '22 at 02:54
  • @wim It worked at the moment of writing. This whole story was a nasty bug in python implementation. Believe it or not, back in the 2015 python was unable to open files with non-ascii chars in filename ootb. Thankfully it is fixed now. Why are you investigating an issue from the all-forgotten past? Do you have a valid use case for it in 2022? Btw you have a broken link in your profile. – Antony Hatchkins Feb 14 '22 at 04:52
  • Yes, I saw a case recently where Python 3.6 was still using ANSI_X3.4-1968 even though the var was set to C.UTF-8. It turned out to be because the docker image in use did not have any langpacks installed (presumably to save space and slim down containers). Which broken link in profile? I checked and did not see one.. – wim Feb 14 '22 at 11:24
  • @wim Yes, working with slim docker images is like walking on thin ice :) https://downforeveryoneorjustme.com/wimglenn.com?proto=http&www=1 – Antony Hatchkins Feb 15 '22 at 06:50
  • @wim I vaguely remember that one of the encoding-related workarounds only worked if run in `sitecustomize.py`. I tried to reproduce this bug on a number of machines, but couldn't fine one with ANSI_X3.4-1968 encoding. – Antony Hatchkins Feb 15 '22 at 07:11
3

The file system encoding is, in many cases, an inherent property of the operating system. It cannot be changed — if, for some reason, you need to create files with names encoded differently than the filesystem encoding implies, don't use Unicode strings for filenames. (Or, if you're using Python 3, use a bytes object instead of a string.)

See the documentation for details. In particular, note that, on Windows systems, the file system is natively Unicode, so no conversion is actually taking place, and, consequently, it's impossible to use an alternative filesystem encoding.