6

I have already read UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>. While the error message is similar, the code is completely different, because I use os.popen in this question, not open. I cannot use the answers from the other questions to solve this problem.

output = os.popen("dir").read()

This line, which is supposed to assign the output of command "dir" to variable "output", is causing this error:

'charmap' codec can't decode byte 0x88 in position 260: character maps to <undefined>

I think this might be happenning because some files in the folder contain letters such as ł, ą, ę and ć in their names. I have no idea how to fix this though.

user202729
  • 3,358
  • 3
  • 25
  • 36
David
  • 139
  • 2
  • 8
  • Possible duplicate of [UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to ](http://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character) – DYZ Feb 04 '17 at 00:58
  • It may be because your title -- the exact error message -- is instant trigger bait. *By far the most* of similar questions are plain duplicates. If you edit your title and point out that it is specifically `os.popen`, a file system related function rather than local file encoding, you might get a better reception. (Unless it *is* the same issue -- I'm not sure about either.) – Jongware Feb 04 '17 at 13:26

3 Answers3

6

os.popen is just a wrapper around subprocess.Popen along with a io.TextIOWrapper object:

The returned file object reads or writes text strings rather than bytes.

If Python's default encoding doesn't work for you, you should use subprocess.Popen directly.

The underlying issue is that cmd writes ansi garbage by default, even when the output is to a pipe. This behavior may depend on your Windows version.

You can fix this by passing /U flag to cmd:

p = subprocess.Popen('cmd /u /c dir', stdout=subprocess.PIPE)
result = p.communicate()
text = result[0].decode('u16')
Josh Lee
  • 171,072
  • 38
  • 269
  • 275
  • 1
    **subprocess.Popen()** is confusing. I already tried using it, but it never worked properly no matter how hard I tried. – David Feb 04 '17 at 14:30
  • The definition of os.popen shows the invocation that you will already have been using, so you can use that as a starting point. https://hg.python.org/cpython/file/3.6/Lib/os.py There's a Popen object whose stdout file you've been looking at only. – Josh Lee Feb 04 '17 at 14:31
  • Can't figure it out. So far only managed to cause the same error again. – David Feb 04 '17 at 14:45
  • cmd uses the wide-character API when writing to the console. So it's not "even when" but rather *because* the output is a pipe or file that cmd outputs ANSI, unless `/u` overrides this to write UTF-16LE. – Eryk Sun Feb 04 '17 at 17:22
  • Use a command line string when running cmd.exe / `shell=True`. `subprocess.list2cmdline` does not know how to quote and escape command lines for cmd.exe. – Eryk Sun Feb 04 '17 at 17:27
  • @eryksun Fair enough, I don't really understand any of that :) – Josh Lee Feb 05 '17 at 20:57
  • Actually, if there's an attached console, cmd.exe defaults to writing output to a pipe or file using the console's current output codepage, which defaults to OEM text (e.g. 852) instead of ANSI text (e.g. 1250). For example, in a Central Europe locale, "ł" is 0x88 in the OEM codepage 852, but 0x88 is undefined in the ANSI codepage 1250. This appears to be the source of the error reported by the OP. To handle this case when a program doesn't have something like cmd's `/u` option, Python 3.6 has added an "oem" encoding, and `subprocess.Popen` allows passing the `encoding` and `errors` to use. – Eryk Sun Feb 05 '17 at 23:18
  • Note that when cmd.exe writes to the console (i.e. when its `StandardOutput` is a handle for a screen buffer in the attached console, i.e. the conhost.exe process that own the console window) instead of a pipe or file, it writes UTF-16 text using [`WriteConsoleW`](https://msdn.microsoft.com/en-us/library/ms687401). Thus interactive `dir` has no problem displaying Unicode characters in the basic multilingual plane, so long as they're supported by the console's current font (e.g. consolas). – Eryk Sun Feb 05 '17 at 23:21
  • the output decoded as u16 consisted of asian letters. for my case (running "git log" under windows) switching to utf-8 worked fine. – Yiğit Feb 21 '18 at 22:49
3

In this case, using subprocess.Popen is too general, too verbose and too hard to remember. Use subprocess.check_output instead.

It returns a bytes object, which can be converted to str with decode function.

import subprocess
x = subprocess.check_output(['ls','/'])
print(x.decode('utf-8'))

Try it online!

user202729
  • 3,358
  • 3
  • 25
  • 36
0

If someone used the with-statement with the combination of readline() in python2 like me(for a timezone Util in Windows), it won't work for python3:

with os.popen("tzutil /l") as source:
    key, value = self.get_key_value(source, True)
    while value and key:
        timezones_to_json.append({u"key": key, u"value": value, u"toolTip": key})
        key, value = self,get_key_value(source, False)
return timezones_to_json

def get_key_value(self, source, first=False):
    if not first:
        source.readline()
    value = source.stdout.readline().strip()
    key = source.stdout.readline().strip()
    return key, value

So my changes to python3 were:

  1. like @Josh Lee said I used the subprocess.Popen instead, but than I had an AttributeError: __exit__

  2. So you had to Insert .stdout at the end, so the object in the with-statement has __enter__ and __exit__ methods:

    with subprocess.Popen(['tzutil', '/l'], stdout=subprocess.PIPE).stdout as source: