0

this may seem like a trivial and already asked question, but perhaps it might be helpful. And I should point out that it relates to a question already asked for which I provide the link comparing two text files and remove duplicates in python

Problem: I have two.txt files that contain, both, words provided in lists (columns, about 3). Now, I have taken advantage of the script that I attach and which is based on the conversation in the link, though it doesn't actually return me a file that is the result of the comparison.

Let me explain: the goal is to generate a file that has both words from the two files but without duplicates.

I hope I have been sufficiently clear and I thank anyone of good will who wants to help me.

With this, doesn't work with my goal

with open("TEXT1.txt") as f1:
    set1 = set(f1.readlines())

with open("TEXT2.txt") as f2:
    set2 = set(f2.readlines())

nondups = set1 - set2



with open("MERGED.txt", "w") as out:
      out.writelines(nondups)
TforV
  • 135
  • 7
  • There are two problems here: 1) in order to remove duplicate words, you need to treat the input **as words**; `.readlines()` gives you **lines** (hence the name). Then, you need to use a set operation that actually makes sense for the desired result. `-` means "everything that is in `set1` *and not in* `set2`". You apparently want things that are in *either*; that is `|`. – Karl Knechtel Sep 23 '22 at 14:42

2 Answers2

1

Try this:

s1 = {1,2,3,4}
s2 = {3,4,5,6}

print(s1.intersection(s2))

Output: {3, 4}

You only need to change the line nondups = set1 - set2 to nondups = set1.intersection(set2).

Jhanzaib Humayun
  • 1,193
  • 1
  • 4
  • 10
0

Try this: I have commented the code accordingly

# open files a.txt and b.txt and get the content as a list of lines
with open('a.txt') as f:
    a = f.readlines()

with open('b.txt') as f:
    b = f.readlines()

# get the string from the list
a_str = ''.join(a)
b_str = ''.join(b)

# get sets of unique words
a_set = set(a_str.split(" "))
b_set = set(b_str.split(" "))

# merge sets
c_set = a_set.union(b_set)

# write to a new file
with open('c.txt', 'w') as f:
    f.write(' '.join(c_set))
Geeky Quentin
  • 2,469
  • 2
  • 7
  • 28
  • Thank you it works, but just a little detail more...If there is an order in the two files, like columns, is it possible to respect the same? Globally the answer is very performant! Thank you – TforV May 17 '22 at 09:29
  • Do you mean to print different lines for the contents of both text files? – Geeky Quentin May 17 '22 at 09:30
  • In particular columns. I try to explain better: in the two files this words could be in column but with no precise order. So I would like to have as result something similar – TforV May 17 '22 at 09:32
  • 1
    column? Do you mean vertical? that means every words is in a new line in the text file? – Geeky Quentin May 17 '22 at 09:35