2

I have a .tar file that contains 3 folders, of which I only want one of them. Within that folder are text files that have also been compressed into .tar files, and these are the files that I want to uncompress/extract. Furthermore, I only want half of these text files (those that end in v8.EUR.signif_pairs.txt.gz).

All I've managed to do is extract the folder that contains the files I need (they remain compressed) by doing as follows:

import tarfile
my_tar = tarfile.open('D:\\Large data files\\Stracquadanio data\\GTEx_Analysis_v8_eQTL_EUR.tar')
names = my_tar.getnames()
names_f = [x for x in names if 'v8.EUR.signif_pairs.txt.gz' in x]
my_tar.extractall(path = '../../data/interim/GTEx', members=[x for x in my_tar.getmembers() if x.name in names_f])
my_tar.close()

Even then, I get a PermissionError that means only half of the files I want are even extracted in the first place (26/49). Error is as follows:

PermissionError                           Traceback (most recent call last)
<ipython-input-63-7feeb8849e53> in <module>
      3 names = my_tar.getnames()
      4 names_f = [x for x in names if 'v8.EUR.signif_pairs.txt.gz' in x]
----> 5 my_tar.extractall(path = '../../data/interim/GTEx', members=[x for x in my_tar.getmembers() if x.name in names_f])
      6 my_tar.close()

~\anaconda3\lib\tarfile.py in extractall(self, path, members, numeric_owner)
   1998             # Do not set_attrs directories, as we will do that further down
   1999             self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
-> 2000                          numeric_owner=numeric_owner)
   2001 
   2002         # Reverse sort directories.

~\anaconda3\lib\tarfile.py in extract(self, member, path, set_attrs, numeric_owner)
   2040             self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
   2041                                  set_attrs=set_attrs,
-> 2042                                  numeric_owner=numeric_owner)
   2043         except OSError as e:
   2044             if self.errorlevel > 0:

~\anaconda3\lib\tarfile.py in _extract_member(self, tarinfo, targetpath, set_attrs, numeric_owner)
   2110 
   2111         if tarinfo.isreg():
-> 2112             self.makefile(tarinfo, targetpath)
   2113         elif tarinfo.isdir():
   2114             self.makedir(tarinfo, targetpath)

~\anaconda3\lib\tarfile.py in makefile(self, tarinfo, targetpath)
   2159                 target.truncate()
   2160             else:
-> 2161                 copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
   2162 
   2163     def makeunknown(self, tarinfo, targetpath):

~\anaconda3\lib\tarfile.py in copyfileobj(src, dst, length, exception, bufsize)
    245     blocks, remainder = divmod(length, bufsize)
    246     for b in range(blocks):
--> 247         buf = src.read(bufsize)
    248         if len(buf) < bufsize:
    249             raise exception("unexpected end of data")

PermissionError: [Errno 13] Permission denied

Therefore, my problem is two-fold:

  1. How do I extract the text files I want?
  2. How do I solve the PermissionError? Is it something to do with the files themselves or something to do with my methods?

I realise that I could just extract the folder with the text files first, and then extract the files within that folder but then those files would occupy a lot of space when I'm trying to reduce the amount of disk space I use with this script.

I'd like to add that this was written in Jupyter Notebooks on Windows 10, in case that may be the issue. Furthermore, I have previously extracted the text files manually so there is no sort of protection or password to access the files.

misxif
  • 21
  • 5
  • this may help https://stackoverflow.com/questions/35865099/python-extracting-specific-files-with-pattern-from-tar-gz-without-extracting-th – deadshot Aug 01 '20 at 18:00
  • Thanks for the suggestion @deadshot, but that's essentially what I tried and the issue comes from the files being within folder. – misxif Aug 02 '20 at 07:45

0 Answers0