6

How do I make a Bash shell script that can identify all the .jpg, .gif, and .png files, and then identify which of these files are not linked via url(), href, or src in any text file in a folder?

Here's what I started, but I end up getting the inverse of what I want. I don't want to know referenced images, but unreferenced (aka "orphaned") images:

# Change MYPATH to the path where you have the project
find MYPATH -name *.jpg -exec basename {} \; > /tmp/patterns
find MYPATH -name *.png -exec basename {} \; >> /tmp/patterns
find MYPATH -name *.gif -exec basename {} \; >> /tmp/patterns

# Print a list of lines that reference these files
# The cat command simply removes coloring
grep -Rf /tmp/patterns MYPATH | cat

# great -- but how do I print the lines of /tmp/patterns *NOT* listed in any given
# *.php, *.css, or *.html?
Volomike
  • 23,743
  • 21
  • 113
  • 209

3 Answers3

9

With drysdam's help, I created this Bash script, which I call orphancheck.sh and call with "./orphancheck.sh myfolder".

#!/bin/bash

MYPATH=$1

find "$MYPATH" -name *.jpg -exec basename {} \; > /tmp/patterns
find "$MYPATH" -name *.png -exec basename {} \; >> /tmp/patterns
find "$MYPATH" -name *.gif -exec basename {} \; >> /tmp/patterns

for p in $(cat /tmp/patterns); do
    grep -R $p "$MYPATH" > /dev/null || echo $p;
done
Community
  • 1
  • 1
Volomike
  • 23,743
  • 21
  • 113
  • 209
5

I'm a little late to the party (I found this page while looking for the answer myself), but in case it's useful to someone, here is a slightly modified version that returns the path with the filename (and searches for a few more file types):

#!/bin/bash

if [ $# -eq 0 ]
  then
    echo "Please supply path to search under"
    exit 1
fi
MYPATH=$1

find "$MYPATH" -name *.jpg > /tmp/patterns
find "$MYPATH" -name *.png >> /tmp/patterns
find "$MYPATH" -name *.gif >> /tmp/patterns
find "$MYPATH" -name *.js >> /tmp/patterns
find "$MYPATH" -name *.php >> /tmp/patterns

for p in $(cat /tmp/patterns); do
    f=$(basename $p);
    grep -R $f "$MYPATH" > /dev/null || echo $p;
done

It's important to note, though, that you can get false positives just looking at the code statically like this, because code might dynamically create a filename that is then referenced (and expected to exist). So if you blindly delete all files whose paths are returned by this script, without some knowledge of your project, you might regret it.

Eric Majerus
  • 1,099
  • 1
  • 12
  • 23
OsakaWebbie
  • 645
  • 1
  • 7
  • 21
3
ls -R *jpg *gif *png | xargs basename > /tmp/patterns
grep -f /tmp/patterns *html

The first line (recursively--your problem is ill-specified, so I thought I'd be a little general) finds all images and strips off the directory portion using basename. Save that in a list of patterns. Then grep using that list in all the html files.

drysdam
  • 8,341
  • 1
  • 20
  • 23
  • Didn't work. Kept saying missing parameter. Had to switch ls/xargs with find -exec to get basename to work. Also could only run command for jpg, then gif, then png, but appended into /tmp/patterns. Once that was established, I can use grep -Rf /tmp/patterns mydir | cat to find lines matching my pattern, but how do I find lines in the pattern that have no match in the mydir (and subdirs') files? – Volomike Nov 17 '11 at 08:59
  • 1
    Oops, I missed that "not", sorry! Instead of `grep -f /tmp/patterns` do `for p in $(cat /tmp/patterns); do grep -R $p *html; done` Check the return code of the grep (or the output) and flag your orphans as you need to. – drysdam Nov 17 '11 at 10:51