8

Whenever I try to read UTF-8 encoded text files, using open(file_name, encoding='utf-8'), I always get an error saying ASCII codec can't decode some characters (eg. when using for line in f: print(line))

Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170118] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'ANSI_X3.4-1968'
>>> import sys
>>> sys.getfilesystemencoding()
'ascii'
>>>

and locale command prints:

locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE=en_HK.UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Vandan Revanur
  • 459
  • 6
  • 17
jm33_m0
  • 595
  • 2
  • 9
  • 17

4 Answers4

7

I had a similar problem. For me, initially the environtment variable LANG was not set (you can check this by running env)

$ python3 -c 'import locale; print(locale.getdefaultlocale())'
(None, None)
$ python3 -c 'import locale; print(locale.getpreferredencoding())'
ANSI_X3.4-1968

The available locales for me was (on a fresh Ubuntu 18.04 Docker image):

$ locale -a
C
C.UTF-8
POSIX

So i picked the utf-8 one:

$ export LANG="C.UTF-8"

And then things work

$ python3 -c 'import locale; print(locale.getdefaultlocale())'
('en_US', 'UTF-8')
$ python3 -c 'import locale; print(locale.getpreferredencoding())'
UTF-8

If you pick a locale that is not avaiable, such as

export LANG="en_US.UTF-8"

it will not work:

$ python3 -c 'import locale; print(locale.getdefaultlocale())'
('en_US', 'UTF-8')
$ python3 -c 'import locale; print(locale.getpreferredencoding())'
ANSI_X3.4-1968

and this is why locale is giving the error messages:

locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
RasmusWL
  • 1,573
  • 13
  • 26
1

I solved it by running the following:

apt install locales-all
Vandan Revanur
  • 459
  • 6
  • 17
mati kepa
  • 2,543
  • 19
  • 24
  • I had this issue with Docker scratch image. So I needed to copy `/usr/lib/locale` from another image – Winand Apr 03 '23 at 15:09
0

By default, Python tries to honor the Unix locale system, including the LC_ALL, LC_CTYPE, and LANG environment variables. In theory, standards are good, but in my experience these variables only cause problems. They're sometimes set to ridiculous values, like non-UTF-8 character sets, for no good reason. Then Python throws errors when print()ing non-ASCII text.

You can fix this by finding out what these environment variables are set to, and why, and change them to something Unicode-capable. But system configuration can be a can of worms.

Python 3.7 and later offer these two quick fixes:

  • Set PYTHONUTF8=1 in the environment when running this script.

  • If you can't do that, then early in your script, force stdout to be UTF-8 by doing

    import sys
    
    sys.stdout.reconfigure(encoding='utf-8')
    
Jason Orendorff
  • 42,793
  • 6
  • 62
  • 96
0

Solution :

import locale 

locale.getpreferredencoding = lambda: "UTF-8"
ISLAM
  • 1
  • 1
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 11 '23 at 18:59