0

I have script to verify a tar file is valid. I'm using cat in this example, but really I'm validating amazon s3 files streamed in.

#!/bin/bash
cat $1  | tar zxf  -  > /dev/null
if [ $? -eq 0 ]; then
  echo  "File is ok ... $1"
else
  echo  "File is corrupted ... $1"
fi

The trouble is the tar file extracts the files in the .tar.gz. I've tried different variations like tar -C /dev/null but with no luck. It either fails or it outputs the files to disk.

How do I extract the tar file without it writing the files? A few other posts have recommended tar t to get the file listing. But I'm not 100% sure just getting file listing will verify the integrity of the files the tar contains.

ForeverConfused
  • 1,607
  • 3
  • 26
  • 41
  • 1
    fyi -- `cat $1` is buggy (if you have `space with filenames.tar.gz`, it'll try to concatenate a file named `space` with the a file named `with` and a third file named `filenames.tar.gz`); use quotes: `cat "$1"`, or better, don't use it at all (`tar zxf "$1"`). – Charles Duffy Jan 11 '17 at 17:41
  • Also, if you don't want to actually write the files, don't use `x` (extract). – Charles Duffy Jan 11 '17 at 17:41
  • 1
    Also, don't check `$?` when you can just put the thing you want to test directly in your `if`, like so: `if tar tzf "$1"; then echo "File is ok"; else echo "File is corrupt"; fi` – Charles Duffy Jan 11 '17 at 17:42
  • To be clear, by the way: **It's `gzip`, not `tar`, that tracks checksums to verify that a file is correct and intact**. That means that *any* `tar` operation that reads the whole file -- including `-t` -- will be fine, since in order for content to be read, it has to have been decompressed by `gunzip`. – Charles Duffy Jan 11 '17 at 17:45
  • While I 100% agree with @CharlesDuffy I think it bears mentioning that you could just stick the `O` flag in your `tar` command to have it write to stdout. Check out `man tar` for more info. Again though, probably not the best method for checking for corruption. – JNevill Jan 11 '17 at 17:46
  • ...you could also leave `tar` out entirely and use `gunzip -t` to test that the data is intact: That won't catch bugs that happened during its creation (if what was originally compressed wasn't a valid tarball it won't get caught), but it'll absolutely catch any kind of modification or truncation in-flight (that wasn't either crafted to fake the checksums or very, very lucky). – Charles Duffy Jan 11 '17 at 17:46

1 Answers1

1

If possible, I would generate a hash on the file on your S3 bucket, then compare that to the hash after you've downloaded the file.

  • Yup. In an ideal world, one uses content-hashed storage, so the *name* of the bucket is also its checksum -- there's no knowledge of the file format at all needed to verify in that case. – Charles Duffy Jan 11 '17 at 17:48
  • These are files exported by a vendor, from console I have no access to. I have no way to checksum the originals, if not I would have done a simple md5. – ForeverConfused Jan 11 '17 at 18:03