11

I have this string in a text file (test.txt):

BLA BLA BLA
BLA BLA
Found 11 errors and 7 warnings

I perform this command:

findstr /r "[0-9]+ errors" test.txt

In order to get just 11 errors string.

Instead, the output is:

Found 11 errors and 7 warnings

Can someone assist?

ohadinho
  • 6,894
  • 16
  • 71
  • 124

3 Answers3

10

findstr always returns every full line that contains a match, it is not capable of returning sub-strings only. Hence you need to do the sub-string extraction on your own. Anyway, there are some issues in your findstr command line, which I want to point out:

The string parameter of findstr actually defines multiple search strings separated by white-spaces, so one search string is [0-9]+ and the other one is error. The line Found 11 errors and 7 warnings in your text file is returned because of the word error only, the numeric part is not part of the match, because findstr does not support the + character (one or more occurrences of previous character or class), you need to change that part of the search string to [0-9][0-9]* to achieve that. To treat the whole string as one search string, you need to provide the /C option; since this defaults to literal search mode, you additionally need to add the /R option explicitly.

findstr /R /C:"[0-9][0-9]* errors" "test.txt"

Changing all this would however also match strings like x5 errorse; to avoid that you could use word boundaries like \< (beginning of word) and \> (end of word). (Alternatively you could also include a space on either side of the search string, so /C:" [0-9][0-9]* errors ", but this might cause trouble if the search string appears at the very beginning or end of the applicable line.)

So regarding all of the above, the corrected and improved command line looks like this:

findstr /R /C:"\<[0-9][0-9]* errors\>" "test.txt"

This will return the entire line containing a match:

Found 11 errors and 7 warnings

If you want to return such lines only and exclude lines like 2 errors are enough or 35 warnings but less than 3 errors, you could of course extend the search string accordingly:

findstr /R /C:"^Found [0-9][0-9]* errors and [0-9][0-9]* warnings$" "test.txt"

Anyway, to extract the portion 11 errors there are several options:

  1. a for /F loop could parse the output of findstr and extract certain tokens:

    for /F "tokens=2-3 delims= " %%E in ('
        findstr/R /C:"\<[0-9][0-9]* errors\>" "test.txt"
    ') do echo(%%E %%F
    
  2. the sub-string replacement syntax could also be used:

    for /F "delims=" %%L in ('
        findstr /R /C:"\<[0-9][0-9]* errors\>" "test.txt"
    ') do set "LINE=%%L"
    set "LINE=%LINE:* =%"
    set "LINE=%LINE: and =" & rem "%"
    echo(%LINE%
    
aschipfl
  • 33,626
  • 12
  • 54
  • 99
7

The findstr tool cannot be used to extract matches only. It is much easier to use Powershell for this.

Here is an example:

$input_path = 'c:\ps\in.txt'
$output_file = 'c:\ps\out.txt'
$regex = '[0-9]+ errors'
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $output_file

See the Windows PowerShell: Extracting Strings Using Regular Expressions article on how to use the script above.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • is there any other cmd tool which this can be done ? I do not want to use powershell for this task – ohadinho Nov 24 '16 at 08:27
  • 1
    On Windows? Well, there are not so many options that support real regex. Powershell is a built-in software, why not use it? If you insist, what about a VBScript solution? – Wiktor Stribiżew Nov 24 '16 at 08:30
  • It might be much easier in PS, but what's the way to do this in "not powershell"? If `findstr` can't do this on its own, which command _can_ be used? – Mike 'Pomax' Kamermans Mar 04 '20 at 22:55
  • @Mike'Pomax'Kamermans Any tool that can extract regex matches. PS is not the only option, but it seems the handiest since it is shipped with Windows. – Wiktor Stribiżew Mar 04 '20 at 22:58
  • Except parts can be added, whereas it sounds like there nothing that can be added to `findstr` to make it work, period. In which case it's not so much "much easier to use Powershell" but "you'll have to use powershell, or some other tool with real regexp capabilities"? – Mike 'Pomax' Kamermans May 28 '20 at 15:31
1

Using Type (or Cat) and Grep can do this.

This will allow for random number of errors (up to four digits).
type c:\temp\test.txt | grep -Eo '[0-9]{1,4} errors'
11 errors

If error number is larger than four digits, modify above to largest expected digits.

For an exact case-sensitive option
type c:\temp\test.txt | grep -o "11 errors"
11 errors

Or this case-insensitive option with Cat
cat c:\temp\test.txt | grep -o -i "11 ERRORS"
11 errors

noni
  • 139
  • 2
  • 8
  • The question is about `cmd` (the Windows command line), which doesn't (natively) support `grep` or `cat` – Stephan Dec 16 '22 at 20:35