0

I am trying to extract the last 2 percentage of a file output coming from the zcat command. I tried something doing

numlines=$(zcat file.tar.gz | wc -l)
zcat file.tar.gz | tail -n + $numlines*(98/100)

But the problem with this approach is my file is too big, and I can't afford to run the zcat command twice. Is there some way I could do it by maybe piping the number of lines , or some other ways.

EDIT : The output of zcat file.tar.gz | tar -xO | dd 2>&1 | tail -n 1 is

16942224047 bytes (17 GB, 16 GiB) copied, 109.154 s, 155 MB/s

Any help would be greatly appreciated.

Pujan Paudel
  • 71
  • 1
  • 7
  • The last 2% of a tar file? But that means you're not going to be able to use it with tar as it's not a complete archive... – Shawn Jun 19 '20 at 19:12
  • If decompression speed is an issue, consider using zstd instead of gzip. – Shawn Jun 19 '20 at 19:13
  • Please add output of `zcat file.tar.gz | tar -xO | dd 2>&1 | tail -n 1` to your question. – Cyrus Jun 19 '20 at 19:18
  • @Cyrus I added it , please take a look . – Pujan Paudel Jun 19 '20 at 19:27
  • @Shawn, I am referring to the last 2 % of the contents of a tar file . I can't decompress the file , so I'm using zcat just to read the content – Pujan Paudel Jun 19 '20 at 19:33
  • But the last X% of a tar archive can be a partial file, or part of one file and 1 or more other files... In the latter case discarding everything before the first complete file header would work for turning it into a usable archive. But your goal doesn't really make sense. And what do you think zcat does if not decompress? – Shawn Jun 19 '20 at 19:52
  • 1
    See https://stackoverflow.com/a/17331179/1745001. Good luck! – Ed Morton Jun 19 '20 at 19:55
  • @PujanPaudel `and I can't afford to run the zcat command twice` You have to. Like, it's impossible to calculate size of a part of a cake now knowing the size of the whole cake. You have to know how many lines you have to calculate 2% of the lines. You could assume that 2% is at least lines and buffer that many while calculating count of lines, but if that's number is too small you will have to re-run zcat anyway. – KamilCuk Jun 23 '20 at 07:12

2 Answers2

2

Read content to a variable. I assume that there is enough RAM available.

content=$(zcat file.tar.gz| tar -xO)
lines=$(wc -l <<<"$content")
ninetyeight=$((100-$lines/100*98))
tail -n $ninetyeight

This only works if the file contains at least 100 lines.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
  • I have pretty good amount of RAM In my system , but I got xrealloc: cannot allocate 18446744071562067968 bytes. I guess the file is just too big , aren't there any non memory consuming idas – Pujan Paudel Jun 19 '20 at 19:05
2

The following awk program will only keep the last n% of your file into memory. The percentage is taken floor wise, that is to say, if we n% of the file represents 134.56 lines, it will print 134 lines

awk -v n=2 '{a[FNR]=$0; min=FNR-int(FNR*n/100)}
            {i=min; while(i in a) delete a[i--]}
            END{for(i=min+1;i<=FNR;++i) print a[i]}' - < <(zcat file)

you can verify this when you replace zcat file with seq 100

kvantour
  • 25,269
  • 4
  • 47
  • 72