I have the following ubuntu script which checks if my pdfs have been OCRed, then OCRs them if they don't. Problem is, I have some pdfs that are a mix of OCR and non-OCR. So, I wanted to add in a condition to the if statement that says if the number of lines or number of words is less than a certain number (say 100 lines of text or 1000 words), to OCR it. I am completely new to ubuntu, and I have added in a couple of lines (in bold).
MYFONTS=$(pdffonts -l 5 "$1" | tail -n +3 | cut -d' ' -f1 | sort | uniq)
**LINECOUNT=$(wc -l)**
if [ "$MYFONTS" = '' ] || [ "$MYFONTS" = '[none]' ] **|| [ "$LINECOUNT" < '100' ]**; then
echo "Not yet OCR'ed: $1 -------- Processing...."
echo " "
ocrmypdf -l eng -s "$1" "$1"
echo " "
else
echo "Already OCR'ed: $1"
echo " "
fi
The script was obtained from here: Batch OCRing PDFs that haven't already been OCR'd