0

I have a file directory with a number of .pdf files in it.
I want to count how many files in the directory have the word SSN in it excluding the files that also contains the words testversion *.1 in it.
For now I have the following code to check for which files have the word SSN in it:

findstr /S /I /M ssn *.pdf  

So files with the word SSN in it and the phrase testversion 1.2 need to show up.
Files with the word SSN in it and the phrase testversion 1.1 don't need to show up.

I think I need to do something with the /R reggex command but I'm not yet mastering reggex.

Mofi
  • 46,139
  • 17
  • 80
  • 143
Thom
  • 591
  • 4
  • 12
  • 30
  • Given that a PDF rarely carried only standard text, can you guarantee that `findstr.exe` is capable of matching either of your strings as text within those .pdf`'s? – Compo May 15 '20 at 12:16
  • Does this answer your question? [Best tool for inspecting PDF files?](https://stackoverflow.com/questions/3549541/best-tool-for-inspecting-pdf-files) – avery_larry May 15 '20 at 14:39
  • @Compo Yes, the PDF files are autogenerated, so there all the same format, and the `findstr` works like a charm – Thom May 15 '20 at 14:41
  • @avery_larry well, I can't download anything because it's a PC on my jobsite. And I was wondering if it could work with a `.bat` file – Thom May 15 '20 at 14:43
  • You mention *testversion* `*.1`, `1.1` and `1.2`. So just `1.1` is not wanted and any other version is? or only `1.2` is wanted? – michael_heath May 15 '20 at 15:13
  • @michael_heath well, any file with `SSN` in it is good, except the ones with also the text `testversion *.1` so any version where the last digit is .1 is not accepted – Thom May 15 '20 at 16:23

1 Answers1

1
@echo off
setlocal

set "count=0"

for /f "delims=" %%A in ('findstr /i /m /s /r /c:"\<testversion [0-9][0-9]*\.[02-9]" "*.pdf"') do (
    for /f "delims=" %%B in ('findstr /i /r /m /c:"\<ssn\>" "%%~A"') do set /a "count+=1"
)

echo %count%

First checks testversion digits.digit where last digit not 1 as that is the conditional value, then checks for ssn. Both for loops return a filename so you can check filename with echo %%A or echo %%B where defined.

In regex: * is previous character or set, 0 or more times. With dir commands etc. * is wildcard. Just to note the difference.

Since pdf files are binary (+ some text) rather than all text, assurances cannot be made if word boundaries \< and \> will work well etc. Patterns may need adjustments as needed. Text only files would surely be better as regular expressions are not designed for binary.

This might be quicker just for counting:

@echo off
setlocal

set "count=0"

for /f "delims=" %%A in ('findstr /i /m /s /r /c:"\<testversion [0-9][0-9]*\.[02-9]" "*.pdf"') do (
    findstr /i /r /m /c:"\<ssn\>" "%%~A" >nul && set /a "count+=1"
)

echo %count%

which eliminates the 2nd for loop.

michael_heath
  • 5,262
  • 2
  • 12
  • 22