Why does locale.getpreferredencoding() return 'ANSI_X3.4-1968' instead of 'UTF-8'?

Question

Whenever I try to read UTF-8 encoded text files, using open(file_name, encoding='utf-8'), I always get an error saying ASCII codec can't decode some characters (eg. when using for line in f: print(line))

Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170118] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'ANSI_X3.4-1968'
>>> import sys
>>> sys.getfilesystemencoding()
'ascii'
>>>

and locale command prints:

locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE=en_HK.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

score 7 · Answer 1 · answered Jul 12 '18 at 09:50

I had a similar problem. For me, initially the environtment variable LANG was not set (you can check this by running env)

$ python3 -c 'import locale; print(locale.getdefaultlocale())'
(None, None)
$ python3 -c 'import locale; print(locale.getpreferredencoding())'
ANSI_X3.4-1968

The available locales for me was (on a fresh Ubuntu 18.04 Docker image):

$ locale -a
C
C.UTF-8
POSIX

So i picked the utf-8 one:

$ export LANG="C.UTF-8"

And then things work

$ python3 -c 'import locale; print(locale.getdefaultlocale())'
('en_US', 'UTF-8')
$ python3 -c 'import locale; print(locale.getpreferredencoding())'
UTF-8

If you pick a locale that is not avaiable, such as

export LANG="en_US.UTF-8"

it will not work:

$ python3 -c 'import locale; print(locale.getdefaultlocale())'
('en_US', 'UTF-8')
$ python3 -c 'import locale; print(locale.getpreferredencoding())'
ANSI_X3.4-1968

and this is why locale is giving the error messages:

locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory

score 1 · Answer 2 · edited Aug 30 '19 at 15:02

1

I solved it by running the following:

apt install locales-all

edited Aug 30 '19 at 15:02

Vandan Revanur

459
6
17

answered Aug 30 '19 at 14:02

mati kepa

2,543
19
24

I had this issue with Docker scratch image. So I needed to copy `/usr/lib/locale` from another image – Winand Apr 03 '23 at 15:09

score 0 · Answer 3 · answered Nov 08 '21 at 14:42

By default, Python tries to honor the Unix locale system, including the LC_ALL, LC_CTYPE, and LANG environment variables. In theory, standards are good, but in my experience these variables only cause problems. They're sometimes set to ridiculous values, like non-UTF-8 character sets, for no good reason. Then Python throws errors when print()ing non-ASCII text.

You can fix this by finding out what these environment variables are set to, and why, and change them to something Unicode-capable. But system configuration can be a can of worms.

Python 3.7 and later offer these two quick fixes:

Set PYTHONUTF8=1 in the environment when running this script.
If you can't do that, then early in your script, force stdout to be UTF-8 by doing
```
import sys

sys.stdout.reconfigure(encoding='utf-8')
```

ISLAM · Answer 4 · 2023-08-09T10:20:08.837

0

Solution :

import locale 

locale.getpreferredencoding = lambda: "UTF-8"

edited Aug 09 '23 at 10:20

answered Aug 09 '23 at 10:15

ISLAM

1
1

1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 11 '23 at 18:59

Why does locale.getpreferredencoding() return 'ANSI_X3.4-1968' instead of 'UTF-8'?

4 Answers4

Linked