0

I'm basing on Batch file to filter strings from text and replace... but in a single batch instead of multiple? but I'm still unable to understand how to do it.

I have an input txt flatfile with data stored in fixed columns. I need to use Windows batch to read line by line on this file. In a specific column, if the value is a specific one, I need to print its line in an output file. The input file doesn't have header so I don't have to worry about it.

In example, say it has 3 columns:

AAAA 1111 jjjjj
BBBB 2222 kkkkk
CCCC 1111 llll

I need to filter lines whose 2nd column is 1111, so the new file must have:

AAAA 1111 jjjjj
CCCC 1111 llll

What's the code to make this filter and print to the output file?

Hikari
  • 3,797
  • 12
  • 47
  • 77

1 Answers1

1

As long as you are filtering on the 8th column or earlier, then all you need is a single FINDSTR command with an appropriately constructed regular expression. Note that FINDSTR regular expression support is very limited and non-standard.

Here is a solution for the example in your question - matching 1111 in the 2nd column:

findstr /rc:"^[^ ][^ ]*  *1111 " input.txt >output.txt

Here is what it would look like to match 1111 in the 5th column

findstr /rc:"^[^ ][^ ]*  *[^ ][^ ]*  *[^ ][^ ]*  *[^ ][^ ]*  *1111 " input.txt >output.txt

The reason this will fail if you attempt to filter on the 9th or later column is because FINDSTR is limited to a maximum of 15 character class ([x]) terms. See What are the undocumented features and limitations of the Windows FINDSTR command? for more info.

A more robust alternative is to use my JREPL.BAT regex utility. JREPL is pure script (hybrid batch/JScript) that runs on any Windows version from XP onward - no 3rd party exe file required.

jrepl "^\S+\s+1111\s" "" /k 0 /f input.txt /o output.txt

If you wanted to match 1111 in the 25th column instead of the 2nd column, then a JREPL solution would look like:

jrepl "^(\S+\s+){24}1111\s" "" /k 0 /f input.txt /o output.txt

Since JREPL is a batch script, you would need to use CALL JREPL if you put the command within another batch script.

dbenham
  • 127,446
  • 28
  • 251
  • 390
  • Thank you. Unfortunately the real file must be filtered on the 8th column, and it failed using findstr saying `FINDSTR: Search string too long`. – Hikari Jul 10 '17 at 02:41
  • I tried jrepl, but I must do a FOR to look for files to be processed and pass %%f as parameter to it. The first iteration works, but the second has a strange behavior where the code is echoed instead of processed, and when the FOR ends the script is exited instead of continuing. I did some tests and it's jrepl the reason for the failure. – Hikari Jul 10 '17 at 02:42
  • @Hikari - It sounds like you failed to use CALL JREPL as I instructed. JREPL is a batch script, so you must use CALL JREPL if you want to include the command within another batch script. – dbenham Jul 10 '17 at 03:46
  • @Hikari - The FINDSTR *"search string too long"* error is interesting. I confirmed that the search fails when trying to get the 8th column. Normally a FINDSTR regex is limited to length 254 (127 on XP). But your search is well within those constraints. There is some other mechanism in play here that I have never seen before – dbenham Jul 10 '17 at 03:50
  • Thanks a lot for the help! I managed to make FINDSTR work. Again, thanks a lot. – Hikari Aug 08 '17 at 20:50