Encoding issue on subprocess.Popen args

Question

Yet another encoding question on Python.

How can I pass non-ASCII characters as parameters on a subprocess.Popen call?

My problem is not on the stdin/stdout as the majority of other questions on StackOverflow, but passing those characters in the args parameter of Popen.

Python script used for testing:

import subprocess

cmd = 'C:\Python27\python.exe C:\path_to\script.py -n "Testç on ã and ê"'

process = subprocess.Popen(cmd,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
output, err = process.communicate()
result = process.wait()

print result, '-', output

For this example call, the script.py receives TestÃ§ on Ã£ and Ãª. If I copy-paste this same command string on a CMD shell, it works fine.

What I've tried, besides what's described above:

Checked if all Python scripts are encoded in UTF-8. They are.
Changed to unicode (cmd = u'...'), received an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 128: ordinal not in range(128) on line 5 (Popen call).
Changed to cmd = u'...'.decode('utf-8'), received an UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 128: ordinal not in range(128) on line 3 (decode call).
Changed to cmd = u'...'.encode('utf8'), results in TestÃ§ on Ã£ and Ãª
Added PYTHONIOENCODING=utf-8 env. variable with no luck.

Looking on tries 2 and 3, it seems like Popen issues a decode call internally, but I don't have enough experience in Python to advance based on this suspicious.

Environment: Python 2.7.11 running on an Windows Server 2012 R2.

I've searched for similar problems but haven't found any solution. A similar question is asked in what is the encoding of the subprocess module output in Python 2.7?, but no viable solution is offered.

I read that Python 3 changed the way string and encoding works, but upgrading to Python 3 is not an option currently.

Thanks in advance.

If Python 3 isn't an option, then you'll have to use ctypes. In Python 2 `Popen` calls WinAPI `CreateProcessA`. The "A" suffix means this function decodes the command-line as an ANSI string (e.g. codepage 1252 in Western Europe) into a native UTF-16LE string. Almost all string handling in Windows and the kernel is UTF-16LE. The non-Unicode codepage encodings are a legacy from DOS and Windows 9x. Their primary use nowadays is to transform UTF-8 into meaningless mojibake... — Eryk Sun, Jan 23 '18 at 03:35
In Python 3, `Popen` calls `CreateProcessW`, passing the command line as a native, UTF-16LE string. The CMD shell is also a Unicode application (since 1993) that calls wide-character `CreateProcessW`. — Eryk Sun, Jan 23 '18 at 03:36
When CMD has to encode and decode strings (e.g. reading a batch script), it uses the legacy console codepage. So it may be possible to run `cmd /k chcp.com 65001` with stdin set to a pipe, and then pipe it the command line as a UTF-8 string. — Eryk Sun, Jan 23 '18 at 03:43
Sorry, that doesn't work, so it's back to ctypes. After changing the console codepage, CMD does try to decode the piped string as UTF-8, but it does so one byte at a time while reading from the pipe, rather than decoding a line at a time. Obviously this fails for non-ASCII characters that use 2-4 bytes per character. — Eryk Sun, Jan 23 '18 at 04:07
@eryksun Thanks, that's some useful information. Please post it as an answer and I'll accept it... Comments on stackoverflow are somewhat transient. — Dinei, Jan 25 '18 at 14:51

score 4 · Accepted Answer · answered Jan 26 '18 at 17:52

4

As noted in the comments, subprocess.Popen in Python 2 is calling the Windows function CreateProcessA which accepts a byte string in the currently configured code page. Luckily Python has an encoding type mbcs which stands in for the current code page.

cmd = u'C:\Python27\python.exe C:\path_to\script.py -n "Testç on ã and ê"'.encode('mbcs')

Unfortunately you can still fail if the string contains characters that can't be encoded into the current code page.

answered Jan 26 '18 at 17:52

Mark Ransom

299,747
42
398
622

Thanks for the advice Mark. I'll try it out in a few days and accept your answer. – Dinei Jan 30 '18 at 13:22
A few days later, tried it now and it worked flawlessly. Thanks a lot Mark. – Dinei Apr 13 '18 at 15:41

Encoding issue on subprocess.Popen args

1 Answers1