I have a .tar file that contains 3 folders, of which I only want one of them. Within that folder are text files that have also been compressed into .tar files, and these are the files that I want to uncompress/extract. Furthermore, I only want half of these text files (those that end in v8.EUR.signif_pairs.txt.gz
).
All I've managed to do is extract the folder that contains the files I need (they remain compressed) by doing as follows:
import tarfile
my_tar = tarfile.open('D:\\Large data files\\Stracquadanio data\\GTEx_Analysis_v8_eQTL_EUR.tar')
names = my_tar.getnames()
names_f = [x for x in names if 'v8.EUR.signif_pairs.txt.gz' in x]
my_tar.extractall(path = '../../data/interim/GTEx', members=[x for x in my_tar.getmembers() if x.name in names_f])
my_tar.close()
Even then, I get a PermissionError that means only half of the files I want are even extracted in the first place (26/49). Error is as follows:
PermissionError Traceback (most recent call last)
<ipython-input-63-7feeb8849e53> in <module>
3 names = my_tar.getnames()
4 names_f = [x for x in names if 'v8.EUR.signif_pairs.txt.gz' in x]
----> 5 my_tar.extractall(path = '../../data/interim/GTEx', members=[x for x in my_tar.getmembers() if x.name in names_f])
6 my_tar.close()
~\anaconda3\lib\tarfile.py in extractall(self, path, members, numeric_owner)
1998 # Do not set_attrs directories, as we will do that further down
1999 self.extract(tarinfo, path, set_attrs=not tarinfo.isdir(),
-> 2000 numeric_owner=numeric_owner)
2001
2002 # Reverse sort directories.
~\anaconda3\lib\tarfile.py in extract(self, member, path, set_attrs, numeric_owner)
2040 self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
2041 set_attrs=set_attrs,
-> 2042 numeric_owner=numeric_owner)
2043 except OSError as e:
2044 if self.errorlevel > 0:
~\anaconda3\lib\tarfile.py in _extract_member(self, tarinfo, targetpath, set_attrs, numeric_owner)
2110
2111 if tarinfo.isreg():
-> 2112 self.makefile(tarinfo, targetpath)
2113 elif tarinfo.isdir():
2114 self.makedir(tarinfo, targetpath)
~\anaconda3\lib\tarfile.py in makefile(self, tarinfo, targetpath)
2159 target.truncate()
2160 else:
-> 2161 copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
2162
2163 def makeunknown(self, tarinfo, targetpath):
~\anaconda3\lib\tarfile.py in copyfileobj(src, dst, length, exception, bufsize)
245 blocks, remainder = divmod(length, bufsize)
246 for b in range(blocks):
--> 247 buf = src.read(bufsize)
248 if len(buf) < bufsize:
249 raise exception("unexpected end of data")
PermissionError: [Errno 13] Permission denied
Therefore, my problem is two-fold:
- How do I extract the text files I want?
- How do I solve the PermissionError? Is it something to do with the files themselves or something to do with my methods?
I realise that I could just extract the folder with the text files first, and then extract the files within that folder but then those files would occupy a lot of space when I'm trying to reduce the amount of disk space I use with this script.
I'd like to add that this was written in Jupyter Notebooks on Windows 10, in case that may be the issue. Furthermore, I have previously extracted the text files manually so there is no sort of protection or password to access the files.