I have two text files , file1 and file2. I want to identify which lines in file2 are not there in file1. How can I do this using a DOS batch file?
Asked
Active
Viewed 831 times
0
-
use a diff program.... – Mitch Wheat Nov 15 '13 at 05:07
-
Try using the `fc` command. – Gabe Nov 15 '13 at 05:10
-
JFTR the `FC` command has bugs in text comparison mode. Are your two files large? A brute force compare would work on small files. – foxidrive Nov 15 '13 at 06:57
2 Answers
1
findstr /v /g:file1 file2
Use findstr, indicating strings to match should be taken from file1, search in file2, to show the lines in file2 that not match against strings in file1

MC ND
- 69,615
- 8
- 84
- 126
-
1This simple answer will frequently fail, depending on file content, due to FINDSTR bugs and design issues. The `/X` option is needed. The `/L` option should be used to force literal interpretation of the strings, but then matches may be missed unless `/I` option is used due to [Why doesn't this FINDSTR example with multiple literal search strings find a match?](http://stackoverflow.com/q/8921253/1012053). Also, backslashes may need to be escaped in file1. See [What are the undocumented features and limitations of the Windows FINDSTR command?](http://stackoverflow.com/q/8844868/1012053) – dbenham Nov 15 '13 at 15:36
-
1@dbenham, i know, and you are right. All the options depend of what the op really need, but in any case, without more info, this is not the optimal tool to do the work. – MC ND Nov 15 '13 at 16:18
0
The Batch file below assume that there are not empty lines in any file.
@echo off
setlocal EnableDelayedExpansion
< file2 (
set /P line2=
for /F "delims=" %%a in (file1) do (
set "line1=%%a"
call :SeekNextLineInFile2ThatMatchThisLineInFile1
)
set "line1="
call :SeekNextLineInFile2ThatMatchThisLineInFile1
)
goto :EOF
:SeekNextLineInFile2ThatMatchThisLineInFile1
if not defined line2 exit /B
if "!line2!" equ "%line1%" goto lineFound
echo !line2!
set "line2="
set /P line2=
goto SeekNextLineInFile2ThatMatchThisLineInFile1
:lineFound
set "line2="
set /P line2=
exit /B
file1:
First line in both files
Second line in both files
Third line in both files
Fourth line in both files
Fifth line in both files
file2:
First line in file2 not in file1
First line in both files
Second line in both files
Second line in file2 not in file1
Third line in file2 not in file1
Third line in both files
Fourth line in both files
Fourth line in file2 not in file1
Fifth line in file2 not in file1
Fifth line in both files
Sixth line in file2 not in file1
Output:
First line in file2 not in file1
Second line in file2 not in file1
Third line in file2 not in file1
Fourth line in file2 not in file1
Fifth line in file2 not in file1
Sixth line in file2 not in file1
EDIT: New method added
The Batch file below does not require any order in the files. However, the files can not contain the equal-sign character and exclamation-marks are stripped. This process is not case sensitive, so two lines with same characters in different case are taken as equal.
@echo off
setlocal EnableDelayedExpansion
rem Read lines from file2
set i=100000
for /F "delims=" %%a in (file2.txt) do (
set /A i+=1
set "line[%%a]=!i:~-5!"
)
rem Remove lines from file1
for /F "delims=" %%a in (file1.txt) do (
set "line[%%a]="
)
echo Result in sorted order:
for /F "tokens=2 delims=[]" %%a in ('set line[') do echo %%a
echo/
echo Result in original file2 order:
(for /F "tokens=2* delims=[]=" %%a in ('set line[') do echo %%bÿ%%a) > temp.txt
for /F "tokens=2 delims=ÿ" %%a in ('sort temp.txt') do echo %%a
del temp.txt

Aacini
- 65,180
- 12
- 72
- 108
-
I don't see how this can work at all since each file is only read once. I can see how it could be made to work if both files are sorted, but it will require more code. Or am I missing something? (I haven't tested) – dbenham Nov 15 '13 at 22:35
-
-
@dbenham: My code just had a small bug, I fixed it (two lines added). My solution answer the question as it was stated. See the added example. – Aacini Nov 15 '13 at 23:47
-
No, there is still a major bug. Copy the last line of file2 and insert as the first line of file1. Your code will report all lines as missing from file1 except for the just inserted line. Both files must be pre-sorted for your strategy to work. And the SORT command is not case sensitive, so it may or may not help, depending on requirements. – dbenham Nov 16 '13 at 15:56
-
@dbenham: I read "which lines in file2 are not there in file1?" and added "the other lines were there already", so I ended with "which lines were added to file1 so it becomes file2?" that, in my opinion, is what the OP really wants to mean. Of course, different interpretations are possible, but _with the data provided in the question_ my solution is valid. PS - We all know that the specifications of this question are not complete, so a definite solution is not possible. If any arbitrary change would be allowed between file1 and file2, then we would have to program FINDSTR in Batch! Isn't it? – Aacini Nov 16 '13 at 22:51
-
1I see nothing in the question that implies any order to the existing files. I read it as a simple set operation - Lines in file 2 minus lines in file 1. Simple in theory - difficult to do efficiently with pure batch, especially if case matters. – dbenham Nov 16 '13 at 23:38
-
@dbenham: Your last comment gives me an idea: result = file2 - file1. Easy! See the new method above. – Aacini Nov 17 '13 at 04:08
-
Question: Is there any way to combine the two `for` commands in my second code in order to avoid the auxiliary file? – Aacini Nov 17 '13 at 17:16