0

So I have File1.txt with content

aaa
ccc

..and File2.txt with content

aaa
bbb
ccc

I want to delete from File2.txt all lines that are also found in File1.txt. So in this example File2 will be left with only one row, "bbb" .

How can I achieve this with a batch file?

Thank you,

user3683832
  • 139
  • 3
  • 9

2 Answers2

1
for /f "tokens=*" %%a in (file1.txt) do (
    REM We have to introduce a third file because we can't use the type command redirects its output to itself. this would result in an empty file
    if exist file3.txt del file3.txt
    TYPE file2.txt |find /i /v "%%a">file3.txt
    COPY /y file3.txt file2.txt
)

This only works if the files do not quote chars (") because this might screw up the quoting of find /i /v "%%a" The magic lies in the /v switch of the find command. it only shows lines which do not contain the requested string.

AcidJunkie
  • 1,878
  • 18
  • 21
  • I'm assuming the OP is looking for exact matches, in which case this will fail if File2 contains `aaaa` because FIND will match `aaa` to `aaaa`. – dbenham Aug 05 '14 at 10:56
0

If you can afford to ignore case when comparing strings, then there is a simple solution using FINDSTR.

findstr /vlixg:"file1.txt" "file2.txt" >"file2.txt.new"
move /y "file2.txt.new" "file2.txt" >nul

The above will not work properly if File1.txt contains \\ or \". Such strings would have to be escaped as \\\ (or \\\\) and \\" (or \\\") .

The reason the search must ignore case is because of a nasty FINDSTR bug: Why doesn't this FINDSTR example with multiple literal search strings find a match?

Below is a robust, case sensitive, but slow, pure native batch solution that reads lines from File1 one line at a time, and removes that line from File2. It uses a temporary file to hold the search string. It could be done using a search string on the command line, except there is an obscure case involving \\ and \" characters that is problematic. See the section titled Escaping Backslash within command line literal search strings within What are the undocumented features and limitations of the Windows FINDSTR command? for more information.

The odd FOR /F syntax is used to disable the EOL and DELIMS options. The USEBACKQ option is only added in case you change the file name to a name with spaces. The toggling of delayed expansion is used to protect ! characters that may be within File1.

@echo off
setlocal disableDelayedExpansion
for /f usebackq^ delims^=^ eol^= %%A in ("File1.txt") do (
  set "ln=%%A"
  setlocal enableDelayedExpansion
  (echo(!ln:\=\\!)>"File1.txt.new"
  endlocal
  findstr /vlxg:"File1.txt.new" "File2.txt" >"File2.txt.new"
  move /y "File2.txt.new" "File2.txt" >nul
)
del "File1.txt.new" 2>nul

Finally, if you are willing to use some hybrid scripting, than the following is robust, and it is very efficient. It relies on a hybrid JScript/batch utility called REPL.BAT to escape all regex meta characters within File1.

@echo off
type "File1.txt"|repl "([[\\.*^$])" \$1 >"File1.txt.new"
findstr /vrxg:"File1.txt.new" "File2.txt" >"File2.txt.new"
move /y "File2.txt.new" "File2.txt" >nul
del "File1.txt.new"
Community
  • 1
  • 1
dbenham
  • 127,446
  • 28
  • 251
  • 390