I have a py script (Courtesy of alexander from Comparing large files with grep or python) to debug two lists of strings.
Now I want to modify it to debug the lists and remove the repeated strings:
filename_1 = 'A.txt'
filename_2 = 'B.txt'
filename_3 = 'C.txt'
with open(filename_1, 'r') as f1, open(filename_2, 'r') as f2, open(filename_3, 'w') as fout:
s = set(val.strip() for val in f1.readlines())
for row in f2:
row = row.strip()
if row not in s:
fout.write(row + '\n')
Contents of list:
A.txt
string1
string2
B.txt
string1
string3
expected result:
C.txt
string1
string2
string3
Thanks
PD: I am new and I apologize. What I really need is to remove the contents of B from list A. Thanks anyway.
This is the answer, having researched (3 cases):
Remove the B.txt content from the A.txt list and exit to C.txt
a=set(line.strip().lower() for line in open('A.txt').readlines())
b=set(line.strip().lower() for line in open('B.txt').readlines())
open("C.txt", "w").write("\n".join(a.difference(b)))
Compare A.txt and B.txt and show new lines B.txt in C.txt
a=set(line.strip().lower() for line in open('A.txt').readlines())
b=set(line.strip().lower() for line in open('B.txt').readlines())
open("C.txt", "w").write("\n".join(b.difference(a)))
Merge the contents of A.txt and B.txt into C.txt
a=set(line.strip().lower() for line in open('A.txt').readlines())
b=set(line.strip().lower() for line in open('B.txt').readlines())
open("C.txt", "w").write("\n".join(b | a))