1

Have numerous image files in a directory that have been corrupted and contain only zero hex value (although some are very large)- among other images that are not corrupt.Want to search through the directory identify files that contain only zeros and delete them.

I am running this from terminal on a mac.

I am thinking grep, recursive, stop searching and eliminate from deletion any file if it contains a value other than zeros, delete zero value files with rm. However- I am a novice and do not know how to put it together.

Or have the zero files moved to a directory from which I can delete them (want to be careful not to delete any good image files.)

  • if your grep supports -z option, can you check if `grep -zL '[^0]'` works? use find for recursive search, or `grep -r` if you have GNU grep – Sundeep May 27 '19 at 15:22

2 Answers2

2

To test if a file named $fname contains only hex zeros, try:

head -c "$(wc -c <"$fname")" /dev/zero | cmp -s - "$fname"

Here, head -c "$(wc -c <"$fname")" /dev/zero creates a string of zero bytes exactly as long as the file $fname. cmp -s - "$fname" compares that string of hex zeros to the file itself. If they match, then cmp sets its exit code to success (0).

To list all regular files in a directory which contain only hex zeros:

for fname in ./*
do
   [ -f "$fname" ] && head -c "$(wc -c <"$fname")" /dev/zero | cmp -s - "$fname" && echo "$fname"
done

To delete all regular files in a directory which contain only hex zeros, we just replace echo with rm:

for fname in ./*
do
   [ -f "$fname" ] && head -c "$(wc -c <"$fname")" /dev/zero | cmp -s - "$fname" && rm "$fname"
done

Here, [ -f "$fname" ] tests to see if the file is a regular file, not a directory. head -c "$(wc -c <"$fname")" /dev/zero | cmp -s - "$fname" tests to see if the file has only hex zeros in it. If cmp retursn success, rm "$fname" deletes that file.

John1024
  • 109,961
  • 14
  • 137
  • 171
  • I will check this out. I think that your solution of creating files of the same length filled with zeros to compare the original files may not be efficient or may be prone to errors- I have hundreds of files each of 20MB or so. – kirkinsonoma May 27 '19 at 14:44
  • @kirkinsonoma While many Unix tools are designed to handle text and might work unreliably with binary data such as you have, `cmp` is not one of them. I chose `cmp` for this because, in all versions, it is intended to work reliably with binary data. – John1024 May 29 '19 at 06:11
1

find to gawk to xargs is very efficient.

Please remove the echo safety after confirming desired command.

#!/bin/bash
mapfile -t Files < <(find . -type f -not -empty)    #1
gawk '
  /[^\x00]/ {f=1; nextfile}                         #2
  ENDFILE {if(!f) print FILENAME; f=0}              #3
' "${Files[@]}" |xargs echo rm                      #4
  1. Recursively store all non-empty files in the current directory in array "Files"
  2. If any line matches a character other than a hex zero, set flag and go to next file
  3. After nextfile or normal EOF, print filename if flag is unset, then unset flag
  4. Feed gawk our find results, then use xargs to construct a single rm command.
vintnes
  • 2,014
  • 7
  • 16
  • in line 2 - gawk is setting a flag to "1" if there is a value in the file other than 0-correct? Does it stop processing and move on when it reaches the first non zero value?- if so that is what I am looking for. I will have to study the remaining syntax to see what it is doing. Question- how do I run this from terminal in mac? – kirkinsonoma May 27 '19 at 14:57
  • @kirkinsonoma Right, `nextfile` moves directly to `ENDFILE`. Test it by pasting in your terminal, then [save it as an executable script](https://stackoverflow.com/a/8779980/11053344). – vintnes May 27 '19 at 16:30
  • @kirkinsonoma You should [Upgrade Bash](https://itnext.io/upgrading-bash-on-macos-7138bd1066ba) – vintnes May 27 '19 at 16:42