-1

I have two text files that i'd like to compare and have any similaritie listed in a new file. For example, if one file contains "123:example:example" and another file contains "123" and "example", i would want "123" and "example" to count as the same and to be listed in a new file. I think that I need to use a split function to have it split at the colon, but I am unsure where I can put that. I tried putting it where the splitlines() is, but that gives an error saying an integer is needed. I am very new to python so any advice or hints are appreciated!

This currently works, but only for lines without colons.

#choose the two files to open
#read the two files, intersection method returns set that 
contains similarity
file1 = set(open('file1.txt').read().lower().splitlines())
file2 = set(open('file2.txt').read().lower().splitlines())

same = file1.intersection(file2)

#any matches are listed in a new file 
with open('result.txt', 'a') as new_file:
for line in same:
    new_file.write(line + '\n')

More complete example:

**File 1:**
123:example:test
testing
abc

**file2:**
test
456:testing
ABC

**desired output in new file:**
test
testing
abc
rocotyco
  • 23
  • 6
  • Does this answer your question? [Split Strings into words with multiple word boundary delimiters](https://stackoverflow.com/questions/1059559/split-strings-into-words-with-multiple-word-boundary-delimiters) – Jack Bashford Nov 30 '20 at 23:13
  • Does this answer your question? [How to split a text file to its words in python?](https://stackoverflow.com/questions/19720311/how-to-split-a-text-file-to-its-words-in-python) – wjandrea Nov 30 '20 at 23:13
  • `line.split(':')[0]` will return the part before the `:` – Barmar Nov 30 '20 at 23:21
  • `set([line.split(':')[0] for line in open('file1.txt').read().lower().splitlines()])` – Barmar Nov 30 '20 at 23:22
  • @Barmar That works, am I able to get it to return the part after the : as well? – rocotyco Nov 30 '20 at 23:36

1 Answers1

1

Split each line and then put each element of that into the set, using a multi-level list comprehension.

file1 = set([item for line in open('file1.txt').read().lower().splitlines() for item in line.split(':')])
file2 = set([item for line in open('file2.txt').read().lower().splitlines() for item in line.split(':')])

same = file1.intersection(file2)
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • This is such a huge help thank you! This is giving me what comes before the : , am I able to get what comes after as well and if there is multiple colons? example, if I have "a:b:c" in one file and "a" and "c" in another, "a" and "c" would be in the new file? – rocotyco Dec 01 '20 at 00:06
  • I thought you just wanted to compare the first item on all the lines between the two files. – Barmar Dec 01 '20 at 00:11
  • You've changed the question, I was answering the original version. – Barmar Dec 01 '20 at 00:12
  • Yes I just changed it sorry about that, realized I worded it wrong. – rocotyco Dec 01 '20 at 00:14
  • Now I don't understand the question. Could you show a more complete example, with input files with more than one line and the desired output file? – Barmar Dec 01 '20 at 00:17
  • I updated the question with a new example at the bottom. Hopefully this makes more sense. I appreciate you helping out this much! I feel your first answer is almost it. – rocotyco Dec 01 '20 at 00:23
  • Getting an error "NameError: Name 'line' is not defined". trying to figure this out. – rocotyco Dec 01 '20 at 00:51
  • Sorry, I got the multilevel list comprehension syntax wrong. – Barmar Dec 01 '20 at 00:56