-1

I was trying to setup a batch-file that uses findstr to kill all lines with a certain pattern. The sourcefile i want to analyse looks like this (i changed all values except of the 16th to numbers, usually they are names, urls, empty or single characters like Y/N):

ProductCode|SkuID|Bestellnr|ProductName|locale_de-DE_ProductName|locale_it-IT_ProductName|locale_nl-NL_ProductName|locale_fr-FR_ProductName|locale_en-GB_ProductName|locale_da-DA_ProductName|locale_cs-CZ_ProductName|locale_sv-SE_ProductName|locale_pl-PL_ProductName|locale_sk-SK_ProductName|ProductType|ProduktLink|OnlineAvailability|ProductNumber|IsProdukt|TerritoryAvailability|Category|SubCategory|ImageLink|Status|Flag0|Flag1|Flag2
0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|Y|17|18|19|20|21|22|23|24|25|26
0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|N|17|18|19|20|21|22|23|24|25|26
0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|N|17|18|19|20|21|22|23|24|25|26
0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|Y|17|18|19|20|21|22|23|24|25|26
0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|Y|17|18|19|20|21|22|23|24|25|26

I just want to exclude all lines that have a N in the 16th parameter. Therefore i came up with a regex pattern that does this:

^([^|]*\|){16}N

Demo that RegEx works (online ressource)

https://regex101.com/r/mE5HVR/1/

When i try to use this feature with findstr like this:

FINDSTR /V "^([^|]*\|){16}N" H:\BatchTest\LineProcessing\myfile.txt >H:\BatchTest\LineProcessing\result.txt
pause
exit

I always get the full file and it seems like regex is not even used. Can anybody point me into the right direction where i can search my mistake? i tried getting more information with this What are the undocumented features and limitations of the Windows FINDSTR command? post but i couldn't find my flaw or oversaw it.

Any help appreciated

  • See the help of `findstr /?` or visit http://ss64.com/nt/findstr.html to recognize that findstr's RegEx capabilities are rather limited. I suggest to switch to PowerShell –  Aug 02 '18 at 12:04
  • 1
    According to the [documentation](https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/findstr), `findstr` only supports a tiny subset of regular expressions... – aschipfl Aug 02 '18 at 12:05
  • I read both of the links before unfortunately i am not sure what part of regex is not supported of my expression. Is it the {16}Part? Because everything else is inside that documentation (from my perspective). And i cannot switch to powershell because the calling programm will only execute batch files. – Johannes Schapdick Aug 02 '18 at 12:11
  • findstr only supports the `*` quantifier, not even `+`. –  Aug 02 '18 at 12:13
  • There is no + quantifier in my expression. – Johannes Schapdick Aug 02 '18 at 12:15
  • I thought I could come up with something like `findstr /V "^[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|N|" "myfile.txt"`, but unfortunately this leads to an error (`FINDSTR: Search string too long.`), because there are too many character classes `[]`, I believe... – aschipfl Aug 02 '18 at 12:15
  • From what i read there is a maximum of 15 groups like this. @aschipfl – Johannes Schapdick Aug 02 '18 at 12:18
  • Yes, but I receive the error even with 14 groups, and the described error message is different; anyway... – aschipfl Aug 02 '18 at 12:21
  • Dave's Findstr documentation and the help make it pretty clear what can be used. WYSIWYG. – Squashman Aug 02 '18 at 12:32
  • 3
    `PowerShell` is only an executable in much the same way as `FindStr` is, so there should be no obvious reason why you cannot invoke it directly from the batch file and benefit from its regular expressions. In fact it would be also possible to use `PowerShell` to read the file as a pipe delimited csv and exclude lines whose 16th field do not match, _(once again this could be invoked from your batch file too)_! – Compo Aug 02 '18 at 13:33

4 Answers4

3

Invoke powershell as a tool from batch:

@Echo off
Set "FileIn=H:\BatchTest\LineProcessing\myfile.txt"
Set "FileOut=H:\BatchTest\LineProcessing\result.txt"
powershell -NoP -C "Get-Content '%FileIn%' |Where-Object {$_ -notmatch '^([^|]*\|){16}N'}"  >"%FileOut%"
pause
exit

Using aliases with powershell could shorten the command

powershell -NoP -C "gc '%FileIn%'|?{$_ -notmatch '^([^|]*\|){16}N'}"  >"%FileOut%"
3

According to the documentation, findstr has got a very limited support of regular expressions.

You might want to try something like this:

findstr /V "^[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|[^|]*|N|" "myfile.txt"

But unfortunately, this results in an error (FINDSTR: Search string too long.), because there are too many character classes [] specified, I think (refer to the useful thread you already referenced in your question: What are the undocumented features and limitations of the Windows FINDSTR command?).


However, I could think of a work-around using a for /F loop to read the file and remove all 16 columns that precede the one of interest; this works only in case none of the preceding columns are empty:

@echo off
set "HEAD=" & set "FLAG="
for /F "usebackq tokens=1-16* delims=| eol=|" %%A in ("%~1") do (
    if not defined HEAD (
        set "HEAD=#" & set "FLAG=#"
    ) else (
        set "LINE=%%Q"
        cmd /V /C echo(!LINE!| > nul findstr "^N|" || set "FLAG=#"
    )
    if defined FLAG (
        echo(%%A^|%%B^|%%C^|%%D^|%%E^|%%F^|%%G^|%%H^|%%I^|%%J^|%%K^|%%L^|%%M^|%%N^|%%O^|%%P^|%%Q
        set "FLAG="
    )
)

This makes the interesting column to appear as the first one, so findstr can be used now.

Or here is another approach not using findstr at all:

@echo off
set "HEAD=" & set "FLAG="
for /F "usebackq tokens=1-17* delims=| eol=|" %%A in ("%~1") do (
    if not defined HEAD (
        set "HEAD=#" & set "FLAG=#"
    ) else (
        if not "%%Q"=="N" set "FLAG=#"
    )
    if defined FLAG (
        echo(%%A^|%%B^|%%C^|%%D^|%%E^|%%F^|%%G^|%%H^|%%I^|%%J^|%%K^|%%L^|%%M^|%%N^|%%O^|%%P^|%%Q^|%%R
        set "FLAG="
    )
)

If any of the column could be empty, you could use the following adapted code:

@echo off
set "LINE="
for /F usebackq^ delims^=^ eol^= %%L in ("%~1") do (
    if not defined LINE (
        set "LINE=%%L"
        echo(%%L
    ) else (
        set "LINE=%%L"
        setlocal EnableDelayedExpansion
        for /F "tokens=17 delims=| eol=|" %%K in ("_!LINE:|=|_!") do (
            endlocal
            set "ITEM=%%K"
            setlocal EnableDelayedExpansion
        )
        if not "!ITEM:~1!"=="N" echo(!LINE!
        endlocal
    )
)

This prefixes every item by an underscore _ intermittently before extracting the value and checking it against N, so no column appears empty to for /F.

aschipfl
  • 33,626
  • 12
  • 54
  • 99
2

User aschipfl has explained why both the simple regex and the workaround regex fail. There is no simple solution using FINDSTR.

You can use my JREPL.BAT regex utility to easily solve the problem. JREPL is pure script (hybrid JScript/batch) that runs natively on any Windows machine from XP onward - No 3rd party exe file is required.

From the command line you could simply use:

jrepl "^([^|]*\|){16}(?!N\|)" "" /k 0 /f myfile.txt /o result.txt

Within a batch file you need to use CALL, which will unfortunately double the quoted ^. The \XSEQ is added so that the extended escape sequence \c can be used in place of ^.

call jrepl "\c([\c|]*\|){16}(?!N\|)" "" /k 0 /xseq /f myfile.txt /o result.txt

The solution(s) above only preserve lines that have at least 17 columns and do not have N as the 17th column; which means it will exclude lines that do not have 17 columns.

If you want to use your original strategy of simply excluding lines that have N as the 17th column, then

jrepl "" "" /exc "/^([^|]*\|){16}N\|/" /k 0 /f myfile.txt /o result.txt

or

call jrepl "" "" /exc "/\c([\c|]*\|){16}N\|/" /k 0 /f myfile.txt /o result.txt

/XSEQ is not required because the /EXC regex automatically supports the extended escape sequences.

dbenham
  • 127,446
  • 28
  • 251
  • 390
2

To supplement my earlier comment and to go alongside the existing PowerShell answer, here's a batch file line which utilises PowerShell but bypasses the need to perform a RegEx.

It reads the file as a pipe delimited csv and outputs the lines whose OnlineAvailability field matches Y, (can be modified to -NotMatch 'N'):

@PowerShell -NoP "IpCSV 'H:\BatchTest\LineProcessing\myfile.txt' -Del '|'|?{$_.OnlineAvailability -Match 'Y'}|EpCSV 'H:\BatchTest\LineProcessing\result.txt' -NoT -Del '|'"

The result should be a properly formed csv, with doublequoted fields.


If you would prefer not to have those doublequoted fields, perhaps this modification would be suitable:

@PowerShell -NoP "IpCSV 'H:\BatchTest\LineProcessing\myfile.txt' -Del '|'|?{$_.OnlineAvailability -Match 'Y'}|ConvertTo-CSV -NoT -Del '|'|%%{$_ -Replace '""',''}|Out-File 'H:\BatchTest\LineProcessing\result.txt'"
Compo
  • 36,585
  • 5
  • 27
  • 39