You can complete your solution by computing the symmetric difference:
with open('output.txt', 'w') as fout::
"\n".join(new_1.symmetric_difference(new_2))
The problem with your initial solution is that when you compute new_1 - new_2
, you only get the lines in file 1 which are not in file 2. But, if there are lines in file 2 which are not in file 1, these won't be written to output.txt
. You want the union of new_1 - new_2
and new_2 - new_1
. This is the symmetric difference. If you don't care about capturing duplicate lines, or preserving any kind of line order between the files, then the symmetric set difference should be sufficient.
However I would suggest using Python's built-in difflib
, which is built for just this. The code snippet below writes the same output as that provided your example (with a trailing newline), but will preserve duplicate lines and relative line ordering between arbitrary input files as well:
import difflib
with open('file_1.txt', 'r') as f:
new_1 = [line.strip() for line in f]
with open('file_2.txt', 'r') as f:
new_2 = [line.strip() for line in f]
difflines = list(difflib.unified_diff(new_1, new_2, lineterm=""))
with open('output.txt', 'w') as fout:
for line in difflines[3:]:
if line.startswith("+") or line.startswith("-"):
fout.write(line[1:] + "\n")
To understand the indexing in the last three lines of this snippet, it helps to inspect the output of difflib.unified_diff()
in the following snippet:
diff = difflib.unified_diff(new_1, new_2, fromfile='file_1.txt', tofile='file_2.txt', lineterm="")
print("\n".join(diff))
The above will print the following, where lines prefixed with a -
are present only in file_1.txt
, lines prefixed with a +
are only present in file_2.txt
, and lines prefixed with a space are present in both files:
--- file_1.txt
+++ file_2.txt
@@ -1,4 +1,3 @@
000b423573 bdbaskbjejbajbkjfsjba
-00036713dc sjgdjgdgdjadgygdeg263
+7736001772 absjueui3ryhfuhuffh3u
00123fd351 heqgrg63u1quidg87gduq
-0105517f52 vgfeeyguuiduiueyruuur
For more information about how this works, see the Python difflib
docs.