-1

I have a script that generates a tar archive using the command

tar -zacf /tmp/foo.tar.gz /home/yotam/foo

it then check if a tar file is in a certain folder, and check if there is any changes between the two archives, if so, it keeps the new one

if ! [ -e /home/yotam/barr/foo.tar.gz ]; then
    cp /tmp/foo.tar.gz /home/yotam/bar/foo.tar.gz
    cond=1
else 
    #compare
    diff --brief <(sort /tmp/foo.tar.gz) <(sort /home/yotam/bar/foo.tar.gz) >/dev/null
    cond=$?

fi

if [ $cond -eq 1 ]; then
    rm /home/yotam/bar/foo.tar.gz
    cp /tmp/foo.tar.gz /home/yotam/foo.tar.gz
fi

However, this script always view the two archive files as different, even if I'm not doing anything in any of the two archives or the foo folder itself. What is wrong with my check?

Edit:

for what it worth, replacing the diff file with

diff --brief  /tmp/foo.tar.gz /home/yotam/bar/foo.tar.gz >/dev/null

yield the same result.

Yotam
  • 10,295
  • 30
  • 88
  • 128
  • it looks a bit weird to me the `sort file.gz`. Shouldn't you `cat file.gz |sort`? I don't think `sort` can handle gzipped files. – fedorqui Apr 11 '15 at 22:03
  • @fedorqui the two options you mention are the same. Anyway, sorting a gz or tar file does not make much sense... – Diego Apr 11 '15 at 22:09
  • @Diego op true, I wanted to say `zcat file.gz | sort` – fedorqui Apr 11 '15 at 22:10

2 Answers2

0

I'm not sure that gzip archive can be used as a hash-function. Perhaps gzip packaging implementation relies on current date-time and then produces different output for each execution.

I'd recommend to use some widely used hash function. Take a look at git internal hash implementation - shasum, for example.

More at: How does git compute file hashes?

Community
  • 1
  • 1
Vladimir Posvistelik
  • 3,843
  • 24
  • 28
  • yes there must be some time field in the way. If, instead, you diff the tar archive they are identical, weird hehe – Diego Apr 11 '15 at 22:19
0

It looks like you're doing a line-wise compare of zipped tar archives, after sorting the lines. There are multiple reasons why this is a bad idea (for one: sorting by like for something that is gzipped doesn't make sense). To check whether 2 files, either use diff file1 file2, or calculate a hash for each file (with md5/md5sum filename) and compare those.

The problem is that gzip adds the name of the files it gzips in the zip archive. If you have 2 identical files and then gzip these, you will get 2 different archives.

So what can you do to solve this? For one you can compare gunziped versions of both files: diff <(gzcat out/out2.tar.gz) <(gzcat out2.tar.gz). I assume you have the sort in there in case the files get tarred in a different order, but I don't think you have to worry about that. If that is a problem for you, check out something like tarsum. This will give you a better result, since if you use sort, you will not notice moving a line from one file to the other, or switching two lines in a file.

Claude
  • 8,806
  • 4
  • 41
  • 56