Compare two text of words, check and erase duplicates, and merge with Python

Question

this may seem like a trivial and already asked question, but perhaps it might be helpful. And I should point out that it relates to a question already asked for which I provide the link comparing two text files and remove duplicates in python

Problem: I have two.txt files that contain, both, words provided in lists (columns, about 3). Now, I have taken advantage of the script that I attach and which is based on the conversation in the link, though it doesn't actually return me a file that is the result of the comparison.

Let me explain: the goal is to generate a file that has both words from the two files but without duplicates.

I hope I have been sufficiently clear and I thank anyone of good will who wants to help me.

With this, doesn't work with my goal

with open("TEXT1.txt") as f1:
    set1 = set(f1.readlines())

with open("TEXT2.txt") as f2:
    set2 = set(f2.readlines())

nondups = set1 - set2



with open("MERGED.txt", "w") as out:
      out.writelines(nondups)

There are two problems here: 1) in order to remove duplicate words, you need to treat the input **as words**; `.readlines()` gives you **lines** (hence the name). Then, you need to use a set operation that actually makes sense for the desired result. `-` means "everything that is in `set1` *and not in* `set2`". You apparently want things that are in *either*; that is `|`. — Karl Knechtel, Sep 23 '22 at 14:42

score 1 · Answer 1 · answered May 17 '22 at 09:19

1

Try this:

s1 = {1,2,3,4}
s2 = {3,4,5,6}

print(s1.intersection(s2))

Output: {3, 4}

You only need to change the line nondups = set1 - set2 to nondups = set1.intersection(set2).

answered May 17 '22 at 09:19

Jhanzaib Humayun

1,193
1
4
10

Thank you for your time and answer. I tried, but, if I want a new file, the merging of f1 and f1 without duplicates? – TforV May 17 '22 at 09:22
I don't know why it doesn't give back also the exact difference with f2 and f1 – TforV May 17 '22 at 09:24
What exactly do you mean? You only need to change the line `nondups = set1 - set2` in your code to `nondups = set1.intersection(set2)`. – Jhanzaib Humayun May 17 '22 at 09:24
If you want the difference, then you can use `s1.union(s2)-s1.intersection(s2)`. – Jhanzaib Humayun May 17 '22 at 09:26

score 0 · Answer 2 · answered May 17 '22 at 09:24

0

Try this: I have commented the code accordingly

# open files a.txt and b.txt and get the content as a list of lines
with open('a.txt') as f:
    a = f.readlines()

with open('b.txt') as f:
    b = f.readlines()

# get the string from the list
a_str = ''.join(a)
b_str = ''.join(b)

# get sets of unique words
a_set = set(a_str.split(" "))
b_set = set(b_str.split(" "))

# merge sets
c_set = a_set.union(b_set)

# write to a new file
with open('c.txt', 'w') as f:
    f.write(' '.join(c_set))

answered May 17 '22 at 09:24

Geeky Quentin

2,469
2
7
28

Thank you it works, but just a little detail more...If there is an order in the two files, like columns, is it possible to respect the same? Globally the answer is very performant! Thank you – TforV May 17 '22 at 09:29
Do you mean to print different lines for the contents of both text files? – Geeky Quentin May 17 '22 at 09:30
In particular columns. I try to explain better: in the two files this words could be in column but with no precise order. So I would like to have as result something similar – TforV May 17 '22 at 09:32
1

column? Do you mean vertical? that means every words is in a new line in the text file? – Geeky Quentin May 17 '22 at 09:35

Compare two text of words, check and erase duplicates, and merge with Python

2 Answers2

Linked