3

I have a Flask application (Linux, Apache with mod_wsgi, Python 3) which calls a shell script with some arguments. When there are any non-ascii characters in the subprocess.run() command arguments, following error occurs in the application:

'ascii' codec can't encode characters in position 5-6: ordinal not in range(128)

I spent a lot of time trying to fix it.

No such problem exists in the command line, only in the application.

The entire application's output is in Unicode and there are no problems with it. After some research I came to the conclusion the problem is with the "filesystem encoding".

I have added some logging statements to my run.wsgi script. The FS encoding was 'ascii' indeed (and 'utf-8' in the command line).

In the next step I found this article How to change file system encoding via python?

The Apache httpd server was started with LANG=C in its environment. I have changed it to C.UTF-8 despite warnings in /etc/sysconfig/httpd. That did not help, the FS encoding was still 'ascii'. I have then even monkey-patched the sys.getfilesystemencoding() to lambda: 'utf-8'. But the error is still there.

I have properly restarted the httpd service after each change.

I'm at my wits' end.

  1. Is my problem really caused by the FS encoding?
  2. If yes, why my attempts to change it to utf-8 failed?
  3. Most importantly: How can I solve this issue?

UPDATE1:

code snippet:

    import subprocess as sub
    cmdresult = sub.run(
        [SCRIPT, tid, days, name],
        stdin=sub.DEVNULL, stdout=sub.PIPE, stderr=sub.DEVNULL,
        encoding='ascii', # 'utf-8' will not help, this affects stdin, stdout I/O only
        check=True)
Graham Dumpleton
  • 57,726
  • 6
  • 119
  • 134
VPfB
  • 14,927
  • 6
  • 41
  • 75
  • 1
    Are you invoking using `shell=True` or not? Are you passing arguments as as a string or a list? Please show the actual code that's using the `subprocess` module. – Daniel Pryden Dec 22 '17 at 19:01
  • At the OS level, `exec()` and friends don't care what encoding you used for the arguments you pass to a subprocess: POSIX just requires that they're representable as a C `char*` string, and it's up to the subprocess to decode them. – Daniel Pryden Dec 22 '17 at 19:04
  • @DanielPryden Code appended, `shell=False` by default. – VPfB Dec 22 '17 at 19:24

2 Answers2

1

(Answering own question hoping it could be helpfull to others)

I made a short test program. This is what I have found:

  1. File system encoding is the key point.
  2. Monkey patching does not work. Well, that's OK. It is not acceptable as a solution anyway.
  3. LANG=C.UTF-8 requires the locale installed and it was not on my system (checked with locale -a). But on a second system where it was available, it worked.
  4. I can make the encoding explicitly and pass bytes as one of the args:

    cmdresult = sub.run(
        [SCRIPT, tid, days, name.encode('utf-8')],
        ...
    

This works, but one question remianed:

Does it comply with the docs?

All I could find is:

args should be a sequence of program arguments or else a single string

And I did understand it as one string or a list of strings, but actually it does not specify a list of what types. I passed also and int to see what will happen. I got this error:

expected str, bytes or os.PathLike object

So my solution seems to be fine.

VPfB
  • 14,927
  • 6
  • 41
  • 75
0

In the context of mod_wsgi, you should ensure you are using mod_wsgi daemon mode and set the lang/locale for the mod_wsgi daemon process group. For a much more detailed explanation which is too much to repeat here, see:

Graham Dumpleton
  • 57,726
  • 6
  • 119
  • 134
  • I have read your blog about deamon mode just few days ago and put the change from embedded to daemon mode high on my list. Thank you for writing these blogs. Regarding this problem, I think it is very little connected with `mod_wsgi`. I found a Python solution. – VPfB Dec 23 '17 at 08:23