0

I'm trying to loop through allURLs.txt and check if every entry in that file exists in PDFtoCheck.pdf. I know of a tool called pdfgrep, but can't seem to apply it to suit my objective.

#!/bin/bash

entriesMissing=0;

cat allURLs.txt | while read line
do
    # do something with $line here
    if [ ! -z echo `pdfgrep "$line" PDFtoCheck.pdf` ];
then
        echo "yay $line";

else
        echo "$line not found";
        entriesMissing=$[$entriesMissing+1];
fi

done

echo "DONE";
echo "There are $entriesMissing entries missing!";

Despite placing dummy values in allURLs.txt, entires which are present in allURLs.txt but not in PDFtoCheck.pdf are not reflected in the output. Any idea how to make it work as intended?

Jared Aaron Loo
  • 668
  • 1
  • 10
  • 18
  • I think your increment is not okay. Try `((entriesMissing++))`. – blackSmith Jun 08 '16 at 09:58
  • 1
    Because your piping, so it creates a subshell, so the variable is lost when you exit the loop. Try searching for `variable is not set loop bash` or similar. – 123 Jun 08 '16 at 10:01
  • 1
    @blackSmith The increment is fine, it's just deprecated syntax for `$(())`. – 123 Jun 08 '16 at 10:01
  • @123 : Sorry for my ignorance. Anyway following will the trick for you Aaron : `count=0; while read line; do x=$(pdfgrep -c "$line" PDFtoCheck.pdf); if [ $x -eq 0 ]; then ((count++)); echo 'lineNotFound'; else echo 'lineFound'; fi ; done < allURLs.txt` – blackSmith Jun 08 '16 at 10:28

2 Answers2

1

Please note that a subshell is created when piping: cat file | while. You should use file redirection instead: while ... do; done < file.

As far as I can see pdfgrep supports the -q quiet flag, so you can just use it in the if-statement.

entriesMissing=0
while IFS= read -r line; do
   if pdfgrep -q -- "$line" PDFtoCheck.pdf; then
     printf "Found '%s'\n" "$line"
   else
     printf "'%s' not found\n" "$line"
     ((entriesMissing++))
   fi
done < allURLs.txt

printf "There are %d entries missing\n" "%entriesMissing"

I also changed the increment to ((... ++))

Andreas Louv
  • 46,145
  • 13
  • 104
  • 123
  • Sorry this is not working... It says not found for all of the entries even though they exist in PDFtoCheck.pdf – Jared Aaron Loo Jun 08 '16 at 11:09
  • @JaredAaronLoo : Have you ever seen pdfgrep work, like from the cmd-line, with a one-word search target? (the simplest case, right?) . Good luck. – shellter Jun 08 '16 at 13:16
  • Yes it works perfectly when used in the terminal. Thats why it puzzles me that the boolean expression is always evaluating to true (even with dummy values inserted into allURLs.txt). Its never going into the else statement. – Jared Aaron Loo Jun 08 '16 at 13:26
0

Extending my comment as answer. I'm using -c option which is also available in pdfgrep :

entriesMissing=0 
while read line 
do 
   # do something with $line here
   if [ $(grep -c "$line" b) -eq 0 ] 
   then 
      ((entriesMissing++)) 
      echo "$line not found"
   else 
      echo "yay $line"
   fi 
done < allURLs.txt

echo "DONE"
echo "There are $entriesMissing entries missing!";

One thing I want point out in your code that you are incrementing entriesMissing inside a subshell(pipe) which doesn't get reflected at the last line. Hope it helps.

blackSmith
  • 3,054
  • 1
  • 20
  • 37