-2

I have two textfiles, how can i compare them?? Basically what i am after is somthing that take the first line from text file 1 and compares it against all the lines in text file 2, if that line does not appear write that line to text file 3.

Then check the next line in text file 1 agaist all the lines in text file 2 and so on.

user1221996
  • 127
  • 1
  • 3
  • 6
  • You already answered your question. You asked as to how you can compare 2 text files, and then you explained how you can do it. So, what the question now? – Shakti Prakash Singh Sep 10 '12 at 11:56

1 Answers1

1

The problem is trivial if you have a copy of grep for Windows. One good free source is GnuWin. You can download individual utilities like grep from the packages link, or you can get the entire GnuWin suite using the Download all link (click on the download button at the beginning of that page).

grep -v -x -F -f file2.txt file1.txt >file3.txt

-v = Inverts the match logic - lists lines that don't match

-x = The entire line must match exactly

-F = The search strings are string literals instead of regular expressions

-f file1.txt = get the search strings from file1.txt


You can almost do the same thing using the native FINDSTR command, except there are 2 problems:

1) Any backslash character \ in a search string must be escaped as \\, even when specifying a literal search.

2) There is a nasty FINDSTR bug that causes some matches to be missed if multiple case sensitive literal search strings are used.

See What are the undocumented features and limitations of the Windows FINDSTR command? for a "complete" list of undocumented FINDSTR issues.

The following will work as long as it is acceptable to do a case insensitive search and file2 does not contain any \ characters:

findstr /x /v /i /l /g:file2.txt file1.txt >file3.txt

The backslash limitation can be eliminated by creating a temp file that escapes the backslashes. It is a bit of code, but the end result still runs fairly quickly. The search must still be case insensitive.

@echo off
setlocal disableDelayedExpansion

::Define the files
set "file1=test1.txt"
set "file2=test2.txt"
set "file3=test3.txt"

::Create an LF variable containing a line feed character
set LF=^


::The above 2 blank lines are critical - do not remove

::Create a modified version of file2 that escapes any backslash
::EOL is set to a linefeed so that all non blank lines are preserved
::Delayed expansion is toggled on and off to protect ! characters
>"%file2%.mod" (
  for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%file2%") do (
    set "ln=%%A"
    setlocal enableDelayedExpansion
    echo(!ln:\=\\!
    endlocal
  )
)

::Find lines in file1 that are missing from file2.mod
findstr /vixlg:"%file2%.mod" "%file1%" >"%file3%"

::Delete the temporary file2.mod
del "%file2%.mod"

It is relatively simple to write a robust native batch solution using 2 FOR loops, but the performance will deteriorate quickly if the files are large.

@echo off
setlocal disableDelayedExpansion

::Define the files
set "file1=test2.txt"
set "file2=test.txt"
set "file3=test3.txt"

::Create an LF variable containing a line feed character
set LF=^


::The above 2 blank lines are critical - do not remove

::Find lines in file1 that are missing from file2.mod
::EOL is set to a linefeed character so that all non blank lines are preserved
>"%file3%" (
  for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%A in ("%file1%") do (
    set "found="
    for /f usebackq^ eol^=^%LF%%LF%^ delims^= %%B in ("%file2%") do (
      if %%A==%%B set found=1
    )
    if not defined found echo %%A
  )
)


There may be a simple and efficient native PowerShell solution, but that is not my expertise.

Community
  • 1
  • 1
dbenham
  • 127,446
  • 28
  • 251
  • 390