0

I'm trying to create a batch that creates a fileC.txt containing all lines in fileA.txt except for those that contains the strings in the lines in fileB.txt:

Pseudo:

foreach(line L in fileA.txt)
     excluded = false
     foreach(string str in fileB.txt)
          if L contains str 
               exclude = true
     if !excluded
          add L to fileC.txt

if L !contains

For example

fileA.txt: (all)

this\here\is\a\line.wav
and\this\is\another.wav
i\am\a\chocolate.wav
peanut\butter\jelly\time.wav

fileB.txt: (those to be excluded)

another.wav
time.wav

fileC.txt: (wanted result)

this\here\is\a\line.wav
i\am\a\chocolate.wav

I've been fiddling around with FINDSTR but I just can't seem to puzzle it together.. any help or pointers greatly appreciated!

Cheers! / Fredde

happytrooper
  • 57
  • 1
  • 4

1 Answers1

2

The answer should be this simple:

findstr /lvg:"fileB.txt" "fileA.txt" >fileC.txt

And with your example, the above does give the correct results.

But there is a nasty FINDSTR bug that makes it unreliable when using multiple case sensitive literal search strings. See Why doesn't this FINDSTR example with multiple literal search strings find a match?, as well as the answer that goes with it. For a "complete" list of undocumented FINDSTR features and bugs, see What are the undocumented features and limitations of the Windows FINDSTR command?.

So the simple code above can fail depending on the content of the files. If you can get away with using a case insensitive search, then the solution is simple.

findstr /livg:"fileB.txt" "fileA.txt" >fileC.txt

Edit: Both versions above will fail if fileB.txt contains \\ or \". In order to work properly, those strings must be escaped as \\\ and \\"

But if you must use a case sensitive search, then there is no simple solution. Your best bet for a pure batch solution might be to use the /R regular expression option. But then you will have to create a modified version of fileB.txt where all regex meta-characters are escaped so that the strings give the correct literal search. That is a mini project in and of itself.

Perhaps your best option for a case sensitive solution is to get a 3rd party tool like grep or sed for Windows.

Edit: Here is a reasonably performing pure batch solution that is nearly bullet proof

I looked into doing something like the proposed logic in your question. But using batch to read all lines in a file is relatively slow. This solution only reads the exclude file line by line. It uses FINDSTR to read the lines in "fileA.txt" repeatedly, once per search string. This is a much faster algorithm for a batch file.

The traditional method to read a file is to use a FOR /F loop, but there is another technique using SET /P that is faster, and it is safe to use with delayed expansion. The only limitations to this method are:

  • It strips trailing control characters from the line
  • It is limited to 1021 bytes per line
  • Each line must be terminated by <CR><LF> as is the Windows standard. It will not work with unix style lines terminated by <LF>

The search strings must have each \ and " escaped as \\ and \" when they are used with the /C option.

@echo off
setlocal enableDelayedExpansion
copy fileA.txt fileC.txt >nul
for /f %%N in ('find /c /v "" ^<fileB.txt') do set len=%%N
<fileB.txt (
  for /l %%N in (1 1 !len!) do (
    set "ln="
    set /p "ln="
    if defined ln (
      set "ln=!ln:\=\\!"
      set ln=!ln:"=\"!
      move /y fileC.txt temp.txt >nul
      findstr /lv /c:"!ln!" temp.txt >fileC.txt
    )
  )
)
del temp.txt
type fileC.txt
Community
  • 1
  • 1
dbenham
  • 127,446
  • 28
  • 251
  • 390
  • Incredibly elaborate and satisfying answer! Even though your first solution was sufficient in my case* I very much enjoyed reading your entire post! I would upvote it if I could and will when I can! I bow before you, o wise dbenham! Thanks :) *pun intended – happytrooper May 08 '12 at 08:40