I'm trying to retrieve the content of a zipped archive with python2.7 on 64bit windows vista. I tried by making a system call to 7zip (my favourite archive manager) using the subprocess module:
# -*- coding: utf-8 -*-
import sys, os, subprocess
Extractor = r'C:\Program Files\7-Zip\7z.exe'
ArchiveName = r'C:\temp\bla.zip'
output = subprocess.Popen([Extractor,'l','-slt',ArchiveName],stdout=subprocess.PIPE).stdout.read()
This works fine as long as the archive content contains only ascii filenames, but when I try it with non-ascii I get an encoded output string variable where ä, ë, ö, ü have been replaced by \x84, \x89, \x94, \x81 (etcetera). I've tried all kinds of decode/encode calls but I'm just too inexperienced with python (and generally too stupid) to reproduce the original characters with umlaut (which is required if I would like to follow-up this step with e.g. an extraction subprocess call to 7z).
Simply put my question is: How do I get this to work also for archives with non-ascii content?
... or to put it in a more convoluted way: Is the output of subprocess always of a fixed encoding or not? In the former case -> Which encoding is it? In the latter case -> How can I control or uncover the encoding of the output of subprocess? Inspired by similar questions on this blog I've tried adding
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
and I've also tried
my_env = os.environ
my_env['PYTHONIOENCODING'] = 'utf-8'
output = subprocess.Popen([Extractor,'l','-slt',ArchiveName],stdout=subprocess.PIPE,env=my_env).stdout.read()
but neither seems to alter the encoding of the output variable (or to reproduce the umlaut).