Is it possible to extract single file from tar bundle in python

Question

I need to fetch a couple of files from a huge svn repo. Whole repo takes almost an hour to be fetched. Files I am looking for are part of tar bundle.

Is it possible to fetch only those two files from tar bundle without extracting the whole bundle through Python Code?

If so, can anybody let me know how should I go about it?

You don't need Python to extract individual files from a tarball. `man tar` to find the options you need. Of course, you need the tarball first before you can manipulate it... — MattDMo, Dec 06 '13 at 22:48
MattDMo I need to do it programmatically and my tarball is placed at svn repo. — Rajan Pathak, Dec 06 '13 at 22:53

score 2 · Answer 1 · edited May 23 '17 at 11:46

It sounds like you have two parts to your question:

Fetching a single tar bundle from the SVN repo, without the rest of the repo's files.
Using Python to extract two files from the retrieved bundle.

For the first part, I'll simply refer to this post on svn export and sparse checkouts.

For the second part, here is a solution for extracting the two files from the retrieved tarball:

import tarfile

files_i_want = ['path/to/file1','path/to/file2']

tar = tarfile.open("bundle.tar")
tar.extractall(members=[x for x in tar.getmembers() if x.name in files_i_want])

score 1 · Answer 2 · answered Dec 06 '13 at 22:56

1

Perhaps you want something like this?

#!/usr/local/cpython-3.3/bin/python

import tarfile as tarfile_mod

def main():
    tarfile = tarfile_mod.TarFile('tar-archive.tar', 'r')
    if False:
        file_ = tarfile.extractfile('etc/protocols')
        print(file_.read())
    else:
        tarfile.extract('etc/protocols')
    tarfile.close()

main()

answered Dec 06 '13 at 22:56

dstromberg

6,954
1
26
27

Thanks dstromberg for your answer,protocols file would be fetched to cureent working directory ,right?Can the tarball read at svn remote repo itself? – Rajan Pathak Dec 06 '13 at 23:04
The if can extract into memory, or to disk; your preference. If you want to read a file from SVN, and you're on Linux, you might try svnfs: http://www.jmadden.eu/index.php/svnfs/ . If you're not on Linux, or you want to avoid a new filesystem, you could "svn export http://host.name.com/dir/file.tar" before using the code above. – dstromberg Dec 07 '13 at 00:00

John1024 · Accepted Answer · 2013-12-07T21:44:09.050

1

Here is one way to get a tar file from svn and extract one file from it all:

import tarfile
from subprocess import check_output
# Capture the tar file from subversion
tmp='/home/me/tempfile.tar'
open(tmp, 'wb').write(check_output(["svn", "cat", "svn://url/some.tar"]))
# Extract the file we want, saving to current directory
tarfile.open(tmp).extract('dir1/fname.ext', path='dir2')

where 'dir1/fname.ext' is the full path to the file that you want within the tar archive. It will be saved in 'dir2/dir1/fname.ext'. If you omit the path argument, it will be saved in 'dir1/fname.ext' under the current directory.

The above can be understood as follows. On a normal shell command line, svn cat url tells subversion to send the file defined by url to stdout (see svn help cat for more info). url can be any type of url that svn understands such as svn://..., svn+ssh://..., or file://.... We run this command under python control using the subprocess module. To do this the svn cat url command is broken up into a list: ["svn", "cat", "url"]. The output from this svn command is saved to a local file defined by the tmp variable. We then use the tarfile module to extract the file you want.

Alternatively, you could use the extractfile method to capture the file data to a python variable:

handle = t.extractfile('dir1/fname.ext')
print handle.readlines() # show file contents

According to the documentation, tarfile should accept a subprocess's stdout as a file handle. This would simplify the code and eliminate the need to save the tar file locally. However, due to a bug, Issue 10436, that will not work.

edited Dec 07 '13 at 21:44

answered Dec 06 '13 at 23:04

John1024

109,961
14
137
171

Thank John1024 for fetching the tar file my call would be like this,t.extract('dir/fname.ext'),right? also is it possible to read/extract the tar file remotely. I mean from svn repo? – Rajan Pathak Dec 06 '13 at 23:16
Yes on the `extract` syntax. You can use the python module `pysvn` to get the tar file via svn. For examples, see [http://pysvn.tigris.org/docs/pysvn_prog_guide.html]. – John1024 Dec 06 '13 at 23:23
@RajanPathak I just updated the answer with a method that starts with extracting the tar file via svn. – John1024 Dec 07 '13 at 09:21
Thanks John for your kind response ,One point I wanted to know about the Popen call,how it work.What its parameters "svn","cat" meant here and do I need to provide svn+ssh or svn alone is enough here. – Rajan Pathak Dec 07 '13 at 09:31
Also, would it download the tarball to my local machine and then extract the file or it read the tarball remotely only and just place required files to my machine? – Rajan Pathak Dec 07 '13 at 09:36
Yes, you can use any url that svn accepts. `svn` is the name of the subversion executable. `cat` is the command that tells subversion to send the output of the url to stdout. It might be possible to extract the file remotely but it would depend on things that you haven't told us: What kinds of access do you have to the server besides svn? Do you have shell access, say via ssh, which allows you to run python or shell (bash or ?) scripts on the server? This might warrant a separate question. – John1024 Dec 07 '13 at 21:55
Thanks @John for your detailed answer,My primary moto is to reduce the time takes for fetching the tar bundle file.For my current implementation it takes almost hour time to fetch the tarball from svn repo and then just extract two files from it.Let me try your solution and see how much time it takes. – Rajan Pathak Dec 08 '13 at 16:01

Is it possible to extract single file from tar bundle in python

3 Answers3

Linked