1

Is there a way to filter the output of a batch file using regex or anything similar ?

In short this is my scenario : I need to clean up log files for further analysis, but due to the size of them I want to downsize them first. The original file is a space delimited file, and I know that for each line the 5th item is what I need.

So far so good, the following file kind of gives me what I need :

@echo off & setLocal enableDELAYedexpansion
@title = logger


for /f "tokens=*" %%a in (test.log) do call :getURI %%a

pause
goto :eof

:getURI

echo %5 >> cleaned.txt
goto :eof

:eof

This gives me the desired output as follows

some_url.html
test.html
some_other_url.html
test.html
test.html
yet_another_url.html
...

Now, it still takes an awful lot off time to generate this file, so I was wondering if there are more efficient ways to do this, and whether it would be possible to filter the output also. I still have for instance quite some [test.html] in the output (fictional example) and I prefer to strip them out upfront, so my outcome would become

some_url.html
some_other_url.html
yet_another_url.html
...

Any advices ?

Wokoman
  • 1,089
  • 2
  • 13
  • 30

4 Answers4

3

Option 1 - pure native batch

@echo off
setlocal disableDelayedExpansion
>cleaned.txt (
  for /f "tokens=5" %%A in (
    'findstr /rvc:"^ *[^ ]*  *[^ ]*  *[^ ]*  *[^ ]*  *test.html" test.log'
  ) do echo %%A
)

The following strategies from above are what improves performance:

  • Use FINDSTR to pre-filter out all test.html lines
  • Eliminate CALL by using FOR /F to parse the 5th token directly
  • Redirect only once using an outer parentheses block

Update

As discussed in this follow up question, this solution becomes horribly slow when dealing with very large files. Good performance can be restored by using a temporary file.

@echo off
setlocal disableDelayedExpansion
findstr /rvc:"^ *[^ ]*  *[^ ]*  *[^ ]*  *[^ ]*  *test.html" test.log >test.log.mod
>cleaned.txt (for /f "tokens=5" %%A in (test.log.mod) do echo %%A)
del test.log.mod


Option 2 - my REPL.BAT utility

I have written a hybrid JScript/batch utility called REPL.BAT that can directly give the desired result very efficiently. It performs a regex search and replace on stdin and writes the result to stdout. It is pure script that will run natively on any modern Windows machine from XP onward.

type test.log | repl "^ *(\S+ +){4}(?!test.html |test.html$)(\S*).*" $2 a >cleaned.txt
Community
  • 1
  • 1
dbenham
  • 127,446
  • 28
  • 251
  • 390
1

This should run faster than your original code, and also eliminate test.html:

echo off & setLocal enableDELAYedexpansion
@title = logger

(for /f "tokens=5" %%a in (test.log) do (
   if "%%a" neq "test.html" echo %%a
)) > cleaned.txt

pause
Aacini
  • 65,180
  • 12
  • 72
  • 108
1
@echo off

    setlocal enableextensions disabledelayedexpansion

    ( for /f "tokens=5" %%a in (test.log) do @echo(%%a
    ) | findstr /v /b /c:"test.html" /c:"another_test.html" > cleaned.txt

    endlocal

The for command will tokenize the lines of the input file, splitting on spaces (default behaviour). We are only interested in the 5th one (tokens=5) that will be echoed. The output of the for command execution is piped to findstr that will show all the lines that does not contain (/v) at the begin of the line (/b) any of the indicated strings (/c:"...")

MC ND
  • 69,615
  • 8
  • 84
  • 126
  • Thanks MC, unfortunatly this did not really do the trick, it seams to work only if both the exceptions are on teh same line, which doesn't really happen. I managed however to work further on your example, and following did work "findstr /v "test another_test". Maybe not the safest way to do this but it works in my case – Wokoman May 28 '14 at 13:07
  • @Wokoman, not sure where you have found a problem. For me, it is working as far as the text is at the start of the line (it should be if this is the 5th field). – MC ND May 28 '14 at 16:59
0

You could run -v option in grep which inverts all the matching lines, giving you all lines that 'dont' match the search.

grep -v "test.htm" > newfile.log

Explanationof -v is at Gnu website.

Commands can be pipped together as many times as you want:

 grep -v "firstunwanted" * | grep -v "secondunwanted" > newfile.log
miltonb
  • 6,905
  • 8
  • 45
  • 55