-1

I have a py script (Courtesy of alexander from Comparing large files with grep or python) to debug two lists of strings.

Now I want to modify it to debug the lists and remove the repeated strings:

filename_1 = 'A.txt'
filename_2 = 'B.txt'
filename_3 = 'C.txt'
with open(filename_1, 'r') as f1, open(filename_2, 'r') as f2, open(filename_3, 'w') as fout:
    s = set(val.strip() for val in f1.readlines())
    for row in f2:
        row = row.strip()
        if row not in s:
            fout.write(row + '\n')

Contents of list:

 A.txt
 string1
 string2

 B.txt
 string1
 string3

expected result:

 C.txt
 string1
 string2
 string3

Thanks

PD: I am new and I apologize. What I really need is to remove the contents of B from list A. Thanks anyway.

This is the answer, having researched (3 cases):

Remove the B.txt content from the A.txt list and exit to C.txt

a=set(line.strip().lower() for line in open('A.txt').readlines())
b=set(line.strip().lower() for line in open('B.txt').readlines())
open("C.txt", "w").write("\n".join(a.difference(b)))

Compare A.txt and B.txt and show new lines B.txt in C.txt

a=set(line.strip().lower() for line in open('A.txt').readlines())
b=set(line.strip().lower() for line in open('B.txt').readlines())
open("C.txt", "w").write("\n".join(b.difference(a)))

Merge the contents of A.txt and B.txt into C.txt

a=set(line.strip().lower() for line in open('A.txt').readlines())
b=set(line.strip().lower() for line in open('B.txt').readlines())
open("C.txt", "w").write("\n".join(b | a))
acgbox
  • 312
  • 2
  • 13
  • 2
    wait so what have you tried and what problem did you run into? – OLIVER.KOO Aug 08 '17 at 15:51
  • remove the repeated strings – acgbox Aug 08 '17 at 15:53
  • remove the repeated strings from listA and listB then save the result to list C? if so, why does listC contain `string2` when it is repeated in listA and listB ? – OLIVER.KOO Aug 08 '17 at 15:54
  • @acaler It looks like you're on Stack Overflow to make other people write code for you. You aren't making any effort to understand why the code works. Flagged as I don't condone this kind of behavior, and that is not what Stack Overflow is about. – JoshuaRLi Aug 08 '17 at 15:54
  • @JoshuaRLi I am sorry. I am new and I do not know your rules. I did not know that these kinds of questions were forbidden – acgbox Aug 08 '17 at 17:01
  • @acaler Not forbidden, but frowned upon by some because many questions can be answered simply by Googling your issues. If you show some effort (what you've tried, what failed) before asking questions, it'll show for itself. – JoshuaRLi Aug 08 '17 at 17:04
  • @JoshuaRLi I investigated and I just published the result, for the benefit of the community. Correct your rating. Thank you – acgbox Aug 08 '17 at 17:57
  • @acaler Great to hear :) Done. – JoshuaRLi Aug 08 '17 at 19:04

1 Answers1

1

The first part of the file contains those items in f2 that are not in f1, so then just add all of the contents of f1 to the result.

with open(filename_1, 'r') as f1, open(filename_2, 'r') as f2, open(filename_3, 'w') as fout:
    s = set(val.strip() for val in f1.readlines())
    for row in f2:
        row = row.strip()
        if row not in s:
            fout.write(row + '\n')
    for row in s:
        fout.write(row + '\n')
Alexander
  • 105,104
  • 32
  • 201
  • 196