1

I am trying to port a piece of python2 code to python3. The code works perfectly in python2, but fails in python3. In the original python2 code, data is being compressed into a tarfile as follows:

_tar = tarfile.open(name, mode="w|")
data = StringIO()
data.write(compress(dumps(probe, HIGHEST_PROTOCOL)))
data.seek(0)
info = tarfile.TarInfo()
info.name = 'Probe_%s.lzo' % dest
info.uid = 0
info.gid = 0
info.size = len(data.buf)
info.mode = S_IMODE(0o0444)
info.mtime = mktime(probe.circs[0].created.timetuple())
_tar.addfile(tarinfo=info, fileobj=data)

Now, in another script, this code is being read in the following way:

with tarfile.open(fileobj=stdin, mode="r|") as tar:
    while True:
        cprobe = tar.next()
        if not cprobe:
            raise StopIteration()
        tarx = tar.extractfile(cprobe)
        if not tarx:
            continue
        yield tarx.read()

The second script is intended to be called in the following way:

cat outputOfFirst | python ./second.py 1> outputOfSecond

This works fine in python2. If I use the output of the first script generated through python2, and pass it to the second script with python3, i get the following error:

    with tarfile.open(fileobj=stdin, mode="r|") as tar:
  File "/usr/lib/python3.6/tarfile.py", line 1601, in open
    t = cls(name, filemode, stream, **kwargs)
  File "/usr/lib/python3.6/tarfile.py", line 1482, in __init__
    self.firstmember = self.next()
  File "/usr/lib/python3.6/tarfile.py", line 2297, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/usr/lib/python3.6/tarfile.py", line 1092, in fromtarfile
    buf = tarfile.fileobj.read(BLOCKSIZE)
  File "/usr/lib/python3.6/tarfile.py", line 539, in read
    buf = self._read(size)
  File "/usr/lib/python3.6/tarfile.py", line 547, in _read
    return self.__read(size)
  File "/usr/lib/python3.6/tarfile.py", line 572, in __read
    buf = self.fileobj.read(self.bufsize)
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 512: invalid continuation byte

What would be the python3 equivalent to this? My understanding is that i have to somehow encode the stdin part to something like "latin-1". But i am not sure how that would be done.

Khizar Amin
  • 198
  • 1
  • 2
  • 12
  • 4
    I suspect the problem is caused by stdin being opened by default in text mode (while what you feed it is binary data). You might need to use `stdin.buffer` instead of `stdin`, see https://stackoverflow.com/a/38939320/2897372 – Błotosmętek May 09 '20 at 20:21
  • 1
    If you suspect the data is encoded in latin-1 (could be the case if the Python 2 code ran on an old machine) try opening the tarfile with `tarfile.open(fileobj=stdin, encoding='iso-8859-1', mode="r|)` – MarkM May 09 '20 at 20:31
  • @Błotosmętek That resolved the error, thanks a bunch (y) – Khizar Amin May 09 '20 at 20:34

0 Answers0