2

Is there a way to search for 2 or more spaces in a row between letters, using findstr from the Windows command line?

Example:

Hello world!  - nomatch wanted
Hello  world! - match wanted

What is regular expression syntax?

Also, can you please help me to understand the following command line session (difference between [ ] and [ ]*; the second command returns nothing):

c:\1>findstr -i -r  "view[ ]*data.sub" "view data sub.acf"
View Data Sub.ACF:            "].DATE_STAMP)>=[Forms]![MainNav]![View Data Sub]"
View Data Sub.ACF:            "].DATE_STAMP)<[Forms]![MainNav]![View Data Sub]"

c:\1>findstr -i -r  "view[ ]data.sub" "view data sub.acf"

c:\1>

PS: Just curious; I know about awk, perl, C# etc., but what about findstr?

tshepang
  • 12,111
  • 21
  • 91
  • 136
user1380142
  • 31
  • 1
  • 2
  • 6

2 Answers2

4

If you just want to find two consecutive spaces:

findstr /C:"  " input.txt

Or in a case-insensitive regular expression:

findstr /R /I /C:"lo  wo" input.txt

The important bit is the /C: in front of the pattern. This tells findstr to treat the pattern as a literal string. Without it, findstr splits the pattern into multiple patterns at spaces. Which, in my experience, is never what you want.

Update

To do two or more spaces between letters:

findstr /R /I /C:"[a-z]   *[a-z]" input.txt

Note that there are three spaces in the pattern. This matches a letter, two spaces followed by zero or more spaces (i.e. two or more spaces) and another letter.

arx
  • 16,686
  • 2
  • 44
  • 61
  • In fact I need to detect 2 **or more** spaces between "words". I know `findstr` pretty well. What I don't know is whether a trick exists to use it for a problem I've just described. (IMHO, instead of using "c:" switch, we can always use simple old kind `find`, not `findstr`) – user1380142 May 07 '12 at 22:26
  • @user1380142: find doesn't do regular expressions, as far as I know. – Harry Johnston May 08 '12 at 05:46
  • The answer works as written, but be careful with regex character class ranges. They do not work the way you think. `[a-z]` matches all upper and lower alpha characters (including non-English diacriticals) except for `Z`, even if the search is case sensitive. An equivalent search without the `/I` option would be `findstr /R /C:"[a-zZ] *[a-zZ]" input.txt`. See the section titled "Regex character class ranges [x-y]" near the bottom of http://stackoverflow.com/a/8844873/1012053 for more info. – dbenham Jun 07 '12 at 02:36
0

To find two or more consecutive spaces between letters:

C:\Users\harry> findstr /i /r /c:"o  [ ]*w" test.txt
Hello  world!
Hello   world!

Translation: match lines containing 'o', two spaces, zero or more spaces, 'w'. (The square brackets are redundant, but add clarity.) Presumably, you already know that findstr /? will give you a summary of the regular expression syntax?

As for the second part of your question: as arx already pointed out, the reason you aren't getting the results you expect is that you aren't using the /C flag. Consider your first command:

findstr -i -r  "view[ ]*data.sub" "view data sub.acf"

This is interpreted as a search for any line matching either of two regular expressions, view[ and ]*data.sub. I've done some experiments and I believe the first regex is either being discarded as improperly formed, or interpreted as requiring a match from an empty set of characters. The second regex is interpreted as follows: zero or more of ']', 'data', one arbitrary character, 'sub'. As it happens, this happens to match the same lines as the single regex you thought you were using. Not so when you take away the asterisk:

findstr -i -r  "view[ ]data.sub" "view data sub.acf"

Now the second regex is interpreted as follows: exactly one ']', 'data', one arbitrary character, 'sub'. Since the string ']data' does not occur in your text, none of the lines match. Instead, you should specify /c:

findstr /i /r /c:"view[ ]data.sub" "view data sub.acf"

Now you are searching for a single regex: 'view', a space, 'data', an arbitrary character, 'sub'. This is presumably what you wanted to do.

Harry Johnston
  • 35,639
  • 6
  • 68
  • 158
  • OK, I gradually am getting the point. Now could you please explain me ("translate") the following phrase in `findstr /?` command output: "`/C:string Uses specified string as a **literal** search string.`"? – user1380142 May 08 '12 at 19:50
  • Apparently, in this context, they are using the word "literal" to mean "not split into separate strings at spaces". This is unfortunate since they use the same word with a different meaning earlier on in the same text. Perhaps that phrase was written before findstr supported regular expressions, or just by a different person who wasn't thinking about regular expressions at the time. – Harry Johnston May 09 '12 at 01:10
  • >>...they are using the word "literal" to mean "not split into separate strings at spaces". – user1380142 May 09 '12 at 14:17
  • >>...they are using the word "literal" to mean "not split into separate strings at spaces". Wow! Thanks a lot Harry. Now I am comfortable with `findstr` interpretation of spaces. "C:" switch is more useful than I thought it was, be it what description **means**. I am impressed with stackoverflow.com! – user1380142 May 09 '12 at 14:27
  • @user1380142 - Actually the situation is a bit more complicated. Not only is `/C:"search"` never split at spaces, it is interpretted as a literal string by default. It can be treated as regular expression if the `/R` option is specified. Contrast this with `"search"` passed as a simple argument. It is split at each space into multiple search strings and each string may be interpreted as a literal or a regex, depending on the content of the first string. See the section titled "Default type of search: Literal vs Regular Expression" in http://stackoverflow.com/a/8844873/1012053 for more info. – dbenham Jun 06 '12 at 18:30