0

I am trying to extract as strings all the differences in 2 files with Python. For example by doing this:

import difflib

first_file_lines = open('first_file', encoding='utf-8').readlines()
second_file_lines = open('second_file', encoding='utf-8').readlines()
difference = difflib.HtmlDiff().make_file(first_file_lines, 
second_file_lines, first_file, second_file)
with open(f'difference.html', 'w', encoding='utf-8') as difference_report:
    difference_report.write(difference)

I can generate html file and open it in browser to see all the differences as strings. They are highlighted with color coding - green for Added, yellow for Changed and red for Deleted. Instead of making html file with colorcoded marking, I want to extract all differences as strings and put them in something itterable. My ultimate goal is to check if a string of mine is in the newly added strings in the second file. So different approach is welcome.

  • 1
    Does this answer your question? [Python - difference between two strings](https://stackoverflow.com/questions/17904097/python-difference-between-two-strings) – SuperStormer Jun 11 '22 at 21:40

1 Answers1

0

Here's a suggestion to compare two files and return a string of added strings in one of the files.

Let's assume 2 files:
file1.txt with contents

line1
line2
line3 and more
line 4
line 22

file2.txt with contents

line1
line3
lin2
line2
line44

You could then import files into two separate lists and compare them

# Files to be opened
file1 = 'file1.txt'
file2 = 'file2.txt'

# Method to convert each line of file into a list
def file_2_list(file):
    with open(file) as file:
        lines = file.readlines()
        lines = [line.rstrip() for line in lines]
        return lines

# Create Lists with each line in each file
list1 = file_2_list(file1)
list2 = file_2_list(file2)

# Outputs
print("File contents as lists:")
print(list1)
print(list2)
print("")
print("Are files the same?")
print(list1 == list2)

# Find differences
def diff_lists(lst1, lst2):
    differences = []
    both = []
    for element in lst1:
        if element not in lst2:
            differences.append(element)
        else:
            both.append(element)
    return(differences, both)

print("")
print("Lines that are in file1 but not in file2:")
diff, both = diff_lists(list1, list2)
print(diff)

print("")
print("Lines that are in file2 but not in file1:")
diff, both = diff_lists(list2, list1)
print(diff)

print("")
print("Lines that are both in file2 and in file1:")
print(both)

Output

File contents as lists:
['line1', 'line2', 'line3 and more', 'line 4', 'line 22', '']
['line1', 'line3', 'lin2', 'line2', 'line44', '']

Are files the same?
False

Lines that are in file1 but not in file2:
['line3 and more', 'line 4', 'line 22']

Lines that are in file2 but not in file1:
['line3', 'lin2', 'line44']

Lines that are both in file2 and in file1:
['line1', 'line2', '']

The lists should be iterable as you were looking for

alphazeta
  • 493
  • 6
  • 10
  • Thanks. This will give me as lists file1 and file2. Then it will give me as lists each line of file1 that are not in file2 and vice versa. And last it will give me lists of simmiliar lines. I am able to achieve this. But I want to extract as strings all the differences in each line and see if my string is in it. The other way I figured to try is do what you have suggested or in simmiliar way and then count the appearance of my string in list 1 and list 2. If they are more in list 2, then it will give me desired result. Still, any other suggestions are welcome. – some questions Jun 12 '22 at 20:54
  • I think a combination of my answer and the comment from @SuperStormer above pointing to the below should get you there. https://stackoverflow.com/questions/17904097/python-difference-between-two-strings – alphazeta Jun 13 '22 at 21:46
  • If the answer is acceptable, upvote it – alphazeta Jun 25 '22 at 08:18