0

I've got some tar data in bytes, and want to read it without writing it to the file system.

Writing it to the file system works:

with open('out.tar', 'wb') as f:
     f.write(data)

then, in the shell: tar -xzvf out.tar

But the following errors:

import tarfile
tarfile.open(data, 'r')

'''
  File ".../lib/python3.7/tarfile.py", line 1591, in open
    return func(name, filemode, fileobj, **kwargs)
  File ".../lib/python3.7/tarfile.py", line 1638, in gzopen
    fileobj = gzip.GzipFile(name, mode + "b", compresslevel, fileobj)
  File ".../lib/python3.7/gzip.py", line 163, in __init__
    fileobj = self.myfileobj = builtins.open(fil
'''

what is the right way to read the tar in memory?

Update

The following works:

from io import BytesIO
tarfile.open(fileobj=BytesIO(data), 'r')

Why?

tarfile.open is supposed to be able to work with bytes. Converting the bytes to a file-like object myself and then telling tarfile.open to use the file-like object works, but why is the transformation necessary? When does the raw bytes-based API work vs. not work?

Max Heiber
  • 14,346
  • 12
  • 59
  • 97
  • Possible duplicate? https://stackoverflow.com/questions/44672524/how-to-create-in-memory-file-object/44672691 – Jared Smith Oct 28 '20 at 10:17
  • Not exactly a duplicate, but **very** similar. Using BytesIO is the way to go. – Iñigo González Oct 28 '20 at 10:22
  • `tarfile.open(BytesIO(bytes), 'r')` leads to the same error message - what did you have in mind re BytesIO @Iñigo? – Max Heiber Oct 28 '20 at 10:44
  • @MaxHeiber - that was what I had in mind: tarfile.open(fileobj=BytesIO(the_data),'r') - looks that there is something in the .gzip compression that the module cannot handle, – Iñigo González Oct 28 '20 at 14:24
  • Sounds likely. Still not sure why, though, since the docs say that `tarfile.open` is supposed to detect the compression method and handle it accordingly (including gzip): https://docs.python.org/3/library/tarfile.html. – Max Heiber Oct 28 '20 at 16:28

1 Answers1

0

You can use the tarfile and from there you can read the data using Byte stream.

import tarfile
with tarfile.open(fileobj = BytesIO(your_file_name)) as tar:
   for tar_file in tar:
      if (tar_file.isfile()):
         inner_data = tar.extractfile(tar_file).read().decode('utf-8')
Ravi kant Gautam
  • 333
  • 2
  • 23
  • thanks for your answer. It looks like what I posted in the update before your answer, but I think you meant BytesIO(bytes) rather than `BytesIO(your_file_name)`. I still don't understand why tarfile.open(bytes) doesn't work, or what the error message means. – Max Heiber Oct 28 '20 at 13:54