1

I have a very large .tar.gz file which I can't extract all together because of lack of space. I would like to extract half of its contents, process them, and then extract the remaining half.

The archive contains several subdirectories, which in turn contain files. When I extract a subdirectory, I need all its contents to be extracted with it.

What's the best way of doing this in bash? Does tar already allow this?

Martin Tournoij
  • 26,737
  • 24
  • 105
  • 146
Ricky Robinson
  • 21,798
  • 42
  • 129
  • 185

3 Answers3

4

You can also extract one by one using

tar zxvf file.tar.gz PATH/to/file/inside_archive -C DESTINATION/dir

You can include a script around this:

1) Keep the PATH and DESTINATION same (yes you can use your own base directory for DESTINATION)

2) You can get the path for a file inside archive using

tar -ztvf file.tar.gz

3) You can use a for loop like for files in $(tar -ztvf file.tar.gz | awk '{print $NF}') and define a break condition as per requirement.

I would have done something like:

#!/bin/bash
for files in $(tar -ztvf file.tar.gz| awk '{print $NF}')
do 
subDir=$(dirname $files)
echo $subDir     
tar -C ./My_localDir/${subDir} -zxvf file.tar.gz $files 
done

$subDir contains the name of the sub Directories

Add a break condition to above according to your requirement.

PradyJord
  • 2,136
  • 12
  • 19
  • Thanks. Can I just list all subdirectories inside the archive (they are all at the first level of the hierarchy) and extract the first n of them? Would it be easier? – Ricky Robinson Jun 05 '14 at 10:53
  • Check if just added part in answer helps. – PradyJord Jun 05 '14 at 11:02
  • Thanks. I don't understand where you get "dirname", though. My idea was to loop over its subdirectories (they are all the first level of the file hierarchy of this archive), and extract the first n of them by keeping a very simple counter. In general, I'm very confused. This takes 3 seconds in a graphical environment... :/ – Ricky Robinson Jun 05 '14 at 11:21
  • I just can't find a way of listing all the contents in **non-recursive** way inside the archive. This way I would only get the name of these subdirectories and I could hopefully extract them directly... – Ricky Robinson Jun 05 '14 at 11:23
  • @RickyRobinson Check if NOW, it works for you. I was on phone so please excuse me for hassle. – PradyJord Jun 05 '14 at 12:31
  • hey, thanks, but it's ok! I already found a solution, which I posted here already :) I can accept your answer since you clearly wanted to found a solution for me. :) – Ricky Robinson Jun 05 '14 at 15:28
1

You can for example extract only files which match some pattern:

tar -xvzf largefile.tar.gz --wildcards --no-anchored '*.html'

So, depending on the largefile.tar structure one can extract files with one pattern -> process them -> after that delete files -> extract files with another pattern, and so on.

  • sure, but I really need to extract subdirectories all together and keep the original structure of the archive. Sorry for not mentioning it before. – Ricky Robinson Jun 05 '14 at 10:37
0

OK, so based on this answer, I can list all contents at the desired depth. In my case, the tar.gz file is structured as follows:

archive.tar.gz:
archive/
archive/a/
archive/a/file1
archive/a/file2
archive/a/file3
archive/b/
archive/b/file4
archive/b/file5
archive/c/
archive/c/file6

So I want to loop over subdirectories a, b, c and, for instance extract the first two of them:

parent_folder='archive/'
max_num=2
counter=0
mkdir $parent_folder
for subdir in `tar --exclude="*/*/*" -tf archive.tar.gz`; do
    if [ "$subdir" = "$parent_folder" ];
    then
        echo 'not this one'
        continue        
    fi
    if [ "$counter" -lt "$max_num" ];
    then
        tar zxvf archive.tar.gz $subdir -C ./${parentfolder}${subdir}
        counter=$((counter + 1))
    fi
done

Next, for the remaining files:

max_num=2
counter=0
mkdir $parent_folder
for subdir in `tar --exclude="*/*/*" -tf files.tar.gz`; do
    if [ "$subdir" = "$parent_folder" ];
    then
        echo 'not this one'
        continue        
    fi
    if [ "$counter" -ge "$max_num" ];
    then
        tar zxvf files.tar.gz $subdir -C ./${parent_folder}$subdir
    fi
    counter=$((counter + 1))
done
Community
  • 1
  • 1
Ricky Robinson
  • 21,798
  • 42
  • 129
  • 185