4

I am trying to write a Windows batch file that will look through a specific html index file that looks something like this (simplified)

<a href=emergency.htm>Emergency Calls</a><br>
<a href=EmeRgency.htm>Emergency Calls</a><br>
<a href=Emergency.htm>Emergency Calls</a><br>
<a href=EMERGENCY.htm>Emergency Calls</a><br>
<a href=E911.htm>Emergency Calls</a><br>
<a href=e911.htm>Emergency Calls</a><br>

and print all links whose filenames contain any uppercase letters so that they may be corrected not to so include any.

The following works in unix:

$ grep -v '^<a href=[^A-Z]*\.htm' helpindex.htm
<a href=EmeRgency.htm>Emergency Calls</a><br>
<a href=Emergency.htm>Emergency Calls</a><br>
<a href=EMERGENCY.htm>Emergency Calls</a><br>
<a href=E911.htm>Emergency Calls</a><br>

(the -v reverses the match)

But using the UnxUtils grep under Windows, which is a direct port of unix grep, I can't come up with a way of quoting the regex that works. This would be necessary to use it in a batch file. I've tried ', " with no joy and also the -E switch. Is there any way to do this using this particular toolset?

@janos led me to the findstr command in Windows but it still doesn't work. Looking at the findstr help I see:

FINDSTR [/B] [/E] [/L] [/R] [/S] [/I] [/X] [/V] [/N] [/M] [/O] [/P] [/F:file] [/C:string] [/G:file] [/D:dir list] [/A:color attributes] [/OFF[LINE]] strings [[drive:][path]filename[ ...]]

...
/V Prints only lines that do not contain a match. ...
/C:string Uses specified string as a literal search string. ...

Use spaces to separate multiple search strings unless the argument is prefixed with /C. For example, 'FINDSTR "hello there" x.y' searches for "hello" or "there" in file x.y. 'FINDSTR /C:"hello there" x.y' searches for "hello there" in file x.y.

However, this doesn't work either:

C:\home\sftp>findstr /V  /C:"^<a href=[^A-Z]*\.htm" helpindex.htm
<a href=emergency.htm>Emergency Calls</a><br>
<a href=EmeRgency.htm>Emergency Calls</a><br>
<a href=Emergency.htm>Emergency Calls</a><br>
<a href=EMERGENCY.htm>Emergency Calls</a><br>
<a href=E911.htm>Emergency Calls</a><br>
<a href=e911.htm>Emergency Calls</a><br>

Either findstr is garbage or there is some subtle difference from grep.

Steve Cohen
  • 4,679
  • 9
  • 51
  • 89
  • This works for me in the version that comes with Git Bash. You could also try the native windows commands `find.exe` and `findstr.exe`. They are similar to `grep` (nothing to do with the UNIX `find`) – janos Sep 27 '13 at 18:45
  • find does regex? I didn't know that. – Steve Cohen Sep 27 '13 at 18:47
  • findstr looks like it should work but it doesn't : findstr /V '^ – Steve Cohen Sep 27 '13 at 18:52
  • I don't have a windows now to test it, but one of them needs the patterns enclosed within double quotes, even if it looks unnecessary, like `"simpleterm"`. I don't remember if it was `find` or `findstr`, watch our for that. – janos Sep 27 '13 at 18:59

2 Answers2

6

This works fine for me in Windows command console:

grep -v "^<a href=[^A-Z]*\.htm" helpindex.htm

FINDSTR does not work with [^A-Z] because it uses a non-standard collation sequence: See Why does findstr not handle case properly (in some circumstances)?

You can use FINDSTR to get your desired output using:

findstr /rvc:"^<a href=[^ABCDEFGHIJKLMNOPQRSTUVWXYZ]*\.htm" helpindex.htm

The /C option is needed to force the entire string to be considered one search term.

The /R option is needed to force interpretation of the search term as a regex. The default for the /C option is a string literal.

You might want to have a look at What are the undocumented features and limitations of the Windows FINDSTR command?. There is a long list of "gotchas"

Edit

UnxUtils is an old, outdated distribution of GNU unix utilities for Windows. You should get newer releases from GNU Coreutils: see Difference between UnxUtils and GNU CoreUtils

I believe I got my distribution of GNU Coreutils from http://gnuwin32.sourceforge.net/packages/coreutils.htm. I'm not sure if that is the most up-to-date package, but it should solve your grep problem. It provides a convenient package of many utilities.

Another option is to get individual GNU utilities for Windows from http://gnuwin32.sourceforge.net/packages.html

Community
  • 1
  • 1
dbenham
  • 127,446
  • 28
  • 251
  • 390
2

You may use my FindRepl.bat program that works as you want. For example:

> type helpindex.htm
<a href=emergency.htm>Emergency Calls</a><br>
<a href=EmeRgency.htm>Emergency Calls</a><br>
<a href=Emergency.htm>Emergency Calls</a><br>
<a href=EMERGENCY.htm>Emergency Calls</a><br>
<a href=E911.htm>Emergency Calls</a><br>
<a href=e911.htm>Emergency Calls</a><br>

> FindRepl /V "^<a href=[^A-Z]*\.htm" < helpindex.htm
<a href=EmeRgency.htm>Emergency Calls</a><br>
<a href=Emergency.htm>Emergency Calls</a><br>
<a href=EMERGENCY.htm>Emergency Calls</a><br>
<a href=E911.htm>Emergency Calls</a><br>

You may download FindRepl.bat from this site

Aacini
  • 65,180
  • 12
  • 72
  • 108