2

I have a piece of code that does this:

def command(self, s, level=1):
        sub=subprocess.Popen(s, bufsize=0, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True);
        (out, err) = sub.communicate()

I see this error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbe in position 2: invalid start byte

when I try to call the communicate method. The subprocess popen is reading as strings.

In a working condition it should return a tuple (stdoutdata, stderrdata)

shafi97
  • 21
  • 1
  • 5
  • 1
    Questions seeking debugging help (**"why isn't this code working?"**) should include the desired behavior, *a specific problem or error* and *the shortest code necessary* to reproduce it *as formatted text* **in the question itself**. Questions without **a clear problem statement** are not useful to other readers. See: [mre]. – MattDMo Nov 21 '20 at 22:28
  • Show us how you are running this? By default subprocess.communicate() returns bytes, i.e., it doesn't try to decode them. – Tim Nov 21 '20 at 22:29
  • as mentioned [here](https://stackoverflow.com/questions/44659851/unicodedecodeerror-utf-8-codec-cant-decode-byte-0x8b-in-position-1-invalid/44660123), I encountered this when a program running inside a Python `subprocess` spat out a block of compressed data to stdout amongst other normal text log lines – user5359531 Jun 09 '21 at 22:47

1 Answers1

4

With the universal_newlines=True parameter (which has a more readable alias text=True since Python 3.7), input and output are en-/decoded implicitly by Python. You can tell Python which codec to use through the encoding= parameter. If you don't specify a codec, the same defaults are used as in io.TextIOWrapper.

The default codec depends on a number of factors (OS, locale, Python version), but in your case it is apparently UTF-8. However, your subprocess returns data which is not UTF-8 encoded. So you need to refer to the documentation of that command:

  • Does it return text in a Windows codepage, eg. CP-1252? Then specify this in the encoding= parameter to the subprocess.Popen call.
  • Does it return text at all? If not, omit the universal_newlines parameter and process the binary data returned as bytes objects.
lenz
  • 5,658
  • 5
  • 24
  • 44