How can I properly set utf-8 locale for Python in a docker container?

Question

I am trying to run my python file in a docker container.

I am using NVIDIA container image for PyTorch, release 19.05, which provides Ubuntu 16.04 including Python 3.6 environment.

According to another similar question, I have added the environment parameter -e PYTHONIOENCODING=utf-8 when I ran the docker image:

nvidia-docker run -dit --name teddy -p 8122:22 -e PYTHONIOENCODING=utf-8 1e0071d37342

Although I have checked the locale in the container which seems correct:

root@ce83e4a4301a:/workspace# locale
LANG=
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=C.UTF-8

I still got the error:

root@ce83e4a4301a:/workspace/paddlespeech/examples/other/tts_finetune/tts3# ./run_en.sh 
check oov
Traceback (most recent call last):
  File "local/check_oov.py", line 240, in <module>
    lang=args.lang)
  File "local/check_oov.py", line 161, in get_check_result
    pronunciation_phones = get_pronunciation_phones(lexicon_file)
  File "local/check_oov.py", line 99, in get_pronunciation_phones
    for line in f2.readlines():
  File "/opt/conda/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 6269: ordinal not in range(128)

(The code is fine when it is run in the same machine but not in the container.)

And I checked the code:

...
    with open(lexicon_file, "r") as f2:
        for line in f2.readlines():
...

However, the problem was fixed by manually adding the argument encoding="utf-8" as follows:

...
    with open(lexicon_file, "r", encoding="utf-8") as f2:
        for line in f2.readlines():
...

If you are using `print s1` syntax, you are using Python 2. I don't think it supported `PYTHONIOENCODING`. But then the traceback in the picture is incorrect, because it shows Python 3.6. (Anyhow, please [don’t post images of code, error messages, or other textual data.](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors)) — tripleee, Nov 07 '22 at 10:26
Thank you, I have realized that. But I still got the same problem with the code run in Python 3.6. Besides, I will paste my console errors next time. Thank you for your suggestion! — Yy X, Nov 08 '22 at 00:31
You can still fix _this_ question for the benefit of new visitors with the same problem, or visitors with an idea about how to fix yours. It would be a shame if somebody who knows the answer wasn't able to realize that because they were unable or unwilling to view the image for whatever technical or physical reasons. — tripleee, Nov 08 '22 at 05:23
Thank you for your suggestions again! I have edited this question to fix it. — Yy X, Nov 09 '22 at 06:43
Except now we can't see the code which produced the traceback, so this is now unclear for that reason. Please review the [help] and in particular [How to ask](/help/how-to-ask) as well as the guidance for providing a [mre]. — tripleee, Nov 09 '22 at 06:47

score 0 · Answer 1 · answered Nov 07 '22 at 03:05

0

You should make your string a binary literal during creation with a b prefix

>>> b"(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89".decode("utf-8")
'(｡･ω･｡)ﾉ'

answered Nov 07 '22 at 03:05

ti7

16,375
6
40
68

Thank you, this is helpful for my second problem. I will edit the question to focus on the main problem. – Yy X Nov 07 '22 at 06:17

How can I properly set utf-8 locale for Python in a docker container?

1 Answers1