0

I am trying to run my python file in a docker container.

I am using NVIDIA container image for PyTorch, release 19.05, which provides Ubuntu 16.04 including Python 3.6 environment.

According to another similar question, I have added the environment parameter -e PYTHONIOENCODING=utf-8 when I ran the docker image:

nvidia-docker run -dit --name teddy -p 8122:22 -e PYTHONIOENCODING=utf-8 1e0071d37342

Although I have checked the locale in the container which seems correct:

root@ce83e4a4301a:/workspace# locale
LANG=
LANGUAGE=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=C.UTF-8

I still got the error:

root@ce83e4a4301a:/workspace/paddlespeech/examples/other/tts_finetune/tts3# ./run_en.sh 
check oov
Traceback (most recent call last):
  File "local/check_oov.py", line 240, in <module>
    lang=args.lang)
  File "local/check_oov.py", line 161, in get_check_result
    pronunciation_phones = get_pronunciation_phones(lexicon_file)
  File "local/check_oov.py", line 99, in get_pronunciation_phones
    for line in f2.readlines():
  File "/opt/conda/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 6269: ordinal not in range(128)

(The code is fine when it is run in the same machine but not in the container.)

And I checked the code:

...
    with open(lexicon_file, "r") as f2:
        for line in f2.readlines():
...

However, the problem was fixed by manually adding the argument encoding="utf-8" as follows:

...
    with open(lexicon_file, "r", encoding="utf-8") as f2:
        for line in f2.readlines():
...
Yy X
  • 31
  • 3
  • 1
    If you are using `print s1` syntax, you are using Python 2. I don't think it supported `PYTHONIOENCODING`. But then the traceback in the picture is incorrect, because it shows Python 3.6. (Anyhow, please [don’t post images of code, error messages, or other textual data.](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors)) – tripleee Nov 07 '22 at 10:26
  • Thank you, I have realized that. But I still got the same problem with the code run in Python 3.6. Besides, I will paste my console errors next time. Thank you for your suggestion! – Yy X Nov 08 '22 at 00:31
  • You can still fix _this_ question for the benefit of new visitors with the same problem, or visitors with an idea about how to fix yours. It would be a shame if somebody who knows the answer wasn't able to realize that because they were unable or unwilling to view the image for whatever technical or physical reasons. – tripleee Nov 08 '22 at 05:23
  • 1
    Thank you for your suggestions again! I have edited this question to fix it. – Yy X Nov 09 '22 at 06:43
  • Except now we can't see the code which produced the traceback, so this is now unclear for that reason. Please review the [help] and in particular [How to ask](/help/how-to-ask) as well as the guidance for providing a [mre]. – tripleee Nov 09 '22 at 06:47

1 Answers1

0

You should make your string a binary literal during creation with a b prefix

>>> b"(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89".decode("utf-8")
'(。・ω・。)ノ'
ti7
  • 16,375
  • 6
  • 40
  • 68
  • Thank you, this is helpful for my second problem. I will edit the question to focus on the main problem. – Yy X Nov 07 '22 at 06:17