0

I have searched for similar questions on SO but didn't find anything that worked for me.

I have two large files, they should be the same but one of the files is 60 lines longer than the other. I want to know what these lines are and where I can find them.

I've read that one can use difflib to do this but I can't figure out how to go about it. I always get + and - in the file but I don't want that. I just want to scan through both files and report the uncommon 60 lines into a third file.

I wrote this code but it does not print out the different lines.

f1 = open('file1.txt','r')
f2 = open('file2.txt','r')
f3 = open('file3.txt','w')

diff = set(f1).difference(f2)
same.discard('\n')

for line in same:
    f3.write(line)
Cave
  • 201
  • 1
  • 4
  • 14
  • 2
    Possible duplicate of [Compare two files report difference in python](https://stackoverflow.com/questions/19120489/compare-two-files-report-difference-in-python) – Ken Y-N Oct 06 '17 at 04:42
  • Can't you just run `diff` from your shell? Why do you need to write your own utility? It's a solved problem. – Tom Karzes Oct 06 '17 at 04:45
  • what's the output that you're getting? – Van Peer Oct 06 '17 at 04:51

2 Answers2

5

Well, you could do something like this:

with open('file1.txt') as infile:
    f1 = infile.readlines()

with open('file2.txt') as infile:
    f2 = infile.readlines()

only_in_f1 = [i for i in f1 if i not in f2]
only_in_f2 = [i for i in f2 if i not in f1]

with open('file3.txt', 'w') as outfile:
    if only_in_f1:
        outfile.write('Lines only in file 1:\n')
        for line in only_in_f1:
            outfile.write(line)

    if only_in_f2:
        outfile.write('Lines only in file 2:\n')
        for line in only_in_f2:
            outfile.write(line)

Note: same content in different lines is treated as a difference

Van Peer
  • 2,127
  • 2
  • 25
  • 35
Evan Nowak
  • 895
  • 4
  • 8
2

You can easily solve this using sets.

set1 = set()
with open(file1) as f:
    for line in f:
        set1.add(line.strip())
#Repeat for set 2
with open(diff_file, 'w') as f:
    for line in set2 - set1:
        f.write(line + '\n')
N M
  • 596
  • 4
  • 18