1

I'm currently migration a script from Perl to Python3 (3.6.5). Is is running on Windows Server 2016. The Script builds a command line with arguments and executes the created string with subprocess.check_output. One of the argument option is called -location:"my street". The location can contain special chars like umlaut (äöß) or (áŠ).

When I run the Perl script the special chars are passed correctly to the application. When I run the Python script the special chars are replaced by question marks in the application. I think the called application needs a UTF-8 encoded argument string.

The Perl script runs in UTF-8 mode

use UTF8;
binmode( STDOUT, ":utf-8" );

The Python script is created with PyCharm, UTF-8 encoded and the first line of the script contains

# -*- coding: utf-8 -*-

I tried several things to set encoding to UTF-8 for the subprocess arguments, but it didn't work. I used procmon.exe to compare the application call between the Perl and Python script. What I can see is that the command line that is displayed for Python subprocess call in procmon is readable for me. The working Perl call not. The location string looks for the perl script in procmon looks like this:

-location:"HQ/äöööStraße".

The Perl code looks like this:

$command = "C:\\PROGRAM FILES\\Application\\bin\\cfg.exe"
$operand = "-modify -location:123á456ß99"
$result  = `$command $operand`;

The Python code looks like this:

# -*- coding: utf-8 -*-
import subprocess
result = subprocess.check_output(['C:\\PROGRAM FILES\\Application\\bin\\cfg.exe', "-modify", "-location:123á456ß99"], shell=False, stderr=subprocess.STDOUT)

Any idea what I have to do that the python arguments are passed correctly to the application?

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Jens
  • 37
  • 1
  • 4

2 Answers2

7

In Python 3.3+ you can separately indicate that you expect text in a particular encoding. The keyword argument universal_newlines=True was renamed in 3.7 to the more accurate and transparent text=True.

This keyword basically says "just use whatever encoding is default on my system" (so basically UTF-8 on anything reasonably modern except on Windows, where you get some Cthulhu atrocity from the abyss the system's default code page).

In the absence of this keyword, subprocesses receive and return bytes in Python 3.

Of course, if you know the encoding, you can also separately .decode() the bytes you get back.

If you know the encoding it's probably useful to use the encoding= keyword argument (even if you assume it is also the system encoding; this was added in Python 3.6).

response = subprocess.check_output([...], text=True)
response = subprocess.check_output([...], encoding='utf-8')
response = subprocess.check_output([...]).decode('utf-8')
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • For (much) more, see also https://stackoverflow.com/questions/4256107/running-bash-commands-in-python – tripleee Oct 23 '19 at 12:44
  • I tested all three options. The result is the same as without this keywords. No improvment. – Jens Oct 23 '19 at 13:40
  • So what do you get back? A crash? Or junk data (what data exactly)? – tripleee Oct 23 '19 at 13:46
  • My issue is not the result I get back. It is the argument the is passed to the appplication I call with subprocess. The special chars are sill readable in procmon and not encoded like perl it does. – Jens Oct 23 '19 at 14:03
  • So you expect `"-location:123á456ß99"` to be passed to the process as UTF-8? What is your system encoding? – tripleee Oct 23 '19 at 14:44
1

The trick to get the script running, is to encode the arguments to 'utf8' and then to decode them to 'ansi'.

command = r'C:\PROGRAM FILES\Application\bin\cfg.exe'
argument = ["-modify", "-location:123á456ß99"]

argument_ansi = []
for x in argument:
    argument_ansi.append(x.encode('utf-8').decode('ansi', 'replace'))
cmd = [command]
cmd.extend(argument_ansi)
result = subprocess.check_output(cmd, shell=False, encoding="utf-8", universal_newlines=True)
Jens
  • 37
  • 1
  • 4
  • The definition of `ansi` depends on your system settings so this could be correct or horribly wrong. Probably better to spell out the actual encoding. Also understand that specifying the wrong encoding could remove the error message, but produce incorrect results. – tripleee Mar 26 '22 at 11:13