I am trying to get directory listing on Windows 10 file system using the subprocess.Popen
function and dir
command in Python 3.8.2. To be more specific, I have this piece of code:
import subprocess
process = subprocess.Popen(['dir'], shell = True, stdout = subprocess.PIPE, stderr = subprocess.STDOUT)
for line in iter(process.stdout.readline, b''):
print(line.decode('utf-16'))
process.stdout.close()
When I run the above in a directory that has file names with Unicode characters (such as "háčky a čárky.txt"), I get the following error:
Traceback (most recent call last):
File "error.py", line 5, in <module>
print(line.decode('utf-16'))
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x0a in position 42: truncated data
Obviously, the problem is with the encoding. I have tried using 'utf-8' instead of 'utf-16', but with no success. When I remove the decode('utf-16')
call and use just print(line)
, I get the following output:
b' Volume in drive C is OSDisk\r\n'
b' Volume Serial Number is 9E2B-67E3\r\n'
b'\r\n'
b' Directory of C:\\Users\\asamec\\Dropbox\\DIY\\Python\\AccessibleRunner\\AccessibleRunner\r\n'
b'\r\n'
b'05/14/2021 09:19 AM <DIR> .\r\n'
b'05/14/2021 09:19 AM <DIR> ..\r\n'
b'05/13/2021 09:46 PM 5,697 AccessibleRunner.py\r\n'
b'05/14/2021 09:18 AM 214 error.py\r\n'
b'05/13/2021 05:48 PM 5,642 h\xa0cky a c\xa0rky.txt.py\r\n'
b' 3 File(s) 11,553 bytes\r\n'
b' 2 Dir(s) 230,706,778,112 bytes free\r\n'
When I remove the 'utf-16' argument and leave just print(line.decode())
, I get the following error:
Traceback (most recent call last):
File "error.py", line 5, in <module>
print(line.decode())
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 40: invalid start byte
So the question is how should I decode the processes' standard output so that I can print the correct characters?
Update
Running the chcp 65001
command in the Windows command line before running the python script is the solution. But, the following gives me the same error s above:
import subprocess
process = subprocess.Popen(['cmd', '/c', 'chcp 65001 & dir'], shell = True, stdout = subprocess.PIPE, stderr = subprocess.STDOUT)
for line in iter(process.stdout.readline, b''):
print(line.decode('utf-16'))
process.stdout.close()
However, when running this same Python script for the second time, it starts to work as the code page is already set to 65001. So the question now is how can I set the Windows console code page not prior to running the Python script, but rather in that Python script?