I am trying to port a piece of python2 code to python3. The code works perfectly in python2, but fails in python3. In the original python2 code, data is being compressed into a tarfile as follows:
_tar = tarfile.open(name, mode="w|")
data = StringIO()
data.write(compress(dumps(probe, HIGHEST_PROTOCOL)))
data.seek(0)
info = tarfile.TarInfo()
info.name = 'Probe_%s.lzo' % dest
info.uid = 0
info.gid = 0
info.size = len(data.buf)
info.mode = S_IMODE(0o0444)
info.mtime = mktime(probe.circs[0].created.timetuple())
_tar.addfile(tarinfo=info, fileobj=data)
Now, in another script, this code is being read in the following way:
with tarfile.open(fileobj=stdin, mode="r|") as tar:
while True:
cprobe = tar.next()
if not cprobe:
raise StopIteration()
tarx = tar.extractfile(cprobe)
if not tarx:
continue
yield tarx.read()
The second script is intended to be called in the following way:
cat outputOfFirst | python ./second.py 1> outputOfSecond
This works fine in python2. If I use the output of the first script generated through python2, and pass it to the second script with python3, i get the following error:
with tarfile.open(fileobj=stdin, mode="r|") as tar:
File "/usr/lib/python3.6/tarfile.py", line 1601, in open
t = cls(name, filemode, stream, **kwargs)
File "/usr/lib/python3.6/tarfile.py", line 1482, in __init__
self.firstmember = self.next()
File "/usr/lib/python3.6/tarfile.py", line 2297, in next
tarinfo = self.tarinfo.fromtarfile(self)
File "/usr/lib/python3.6/tarfile.py", line 1092, in fromtarfile
buf = tarfile.fileobj.read(BLOCKSIZE)
File "/usr/lib/python3.6/tarfile.py", line 539, in read
buf = self._read(size)
File "/usr/lib/python3.6/tarfile.py", line 547, in _read
return self.__read(size)
File "/usr/lib/python3.6/tarfile.py", line 572, in __read
buf = self.fileobj.read(self.bufsize)
File "/usr/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 512: invalid continuation byte
What would be the python3 equivalent to this? My understanding is that i have to somehow encode the stdin part to something like "latin-1". But i am not sure how that would be done.