0

Consider the following Python program that prints its understanding of what command line arguments it got:

#!/usr/bin/env python3
print(repr(__import__("sys").argv))

Here's what happens when I run it with a Chinese character as an argument:

$ /tmp/mytest 我                          
['/tmp/mytest', '我']

Now, consider the following Dockerfile that puts it in /tmp/mytest:

FROM ubuntu:18.04
RUN apt-get update && apt-get install -y python3
RUN echo '#!/usr/bin/env python3' >> /tmp/mytest
RUN echo 'print(repr(__import__("sys").argv))' >> /tmp/mytest
RUN chmod +x /tmp/mytest

When I try to run it, the output differs:

$ sudo docker build -t mytest .                
Sending build context to Docker daemon  20.48kB
Step 1/5 : FROM ubuntu:18.04
 ---> 02f9d6707661
Step 2/5 : RUN apt-get update && apt-get install -y python3
 ---> Using cache
 ---> 5c9a6768a337
Step 3/5 : RUN echo '#!/usr/bin/env python3' >> /tmp/mytest
 ---> Using cache
 ---> e0410fc9684e
Step 4/5 : RUN echo 'print(repr(__import__("sys").argv))' >> /tmp/mytest
 ---> Using cache
 ---> d123c9645c5c
Step 5/5 : RUN chmod +x /tmp/mytest
 ---> Using cache
 ---> 9b2ac9b174e0
Successfully built 9b2ac9b174e0
Successfully tagged mytest:latest
$ sudo docker run -ti mytest /tmp/mytest 我
['/tmp/mytest', '\udce6\udc88\udc91']

Why is that? Is Docker or Python to blame here? How do I make the script work the same way in both cases?

d33tah
  • 10,999
  • 13
  • 68
  • 158

1 Answers1

0

Apparently the problem goes away if you generate and export UTF8 locale, like in the following Dockerfile:

FROM ubuntu:18.04
RUN apt-get update && apt-get install -y python3 locales
RUN echo '#!/usr/bin/env python3' >> /tmp/mytest
RUN echo 'print(repr(__import__("sys").argv))' >> /tmp/mytest
RUN locale-gen en_US.UTF-8
ENV LC_ALL=en_US.UTF-8
RUN chmod +x /tmp/mytest
d33tah
  • 10,999
  • 13
  • 68
  • 158