Although you changed a few of the requirements, it appears you want this:
- to read a list of keywords from a file with these keywords on a single line, separated by space
- to find lines of a text document that have any of these keywords on them, and output the line number (index) of the line they appear on and exactly which keywords were on it, for all lines that have them
This script does that:
with open('keywords.txt') as f:
keywords = f.read().split()
with open('document.txt') as f, open('output.txt', 'w') as o:
for n, line in enumerate(f):
if matches := [k for k in keywords if k in line]:
o.write(f'{n+1}: {matches}\n')
With keywords.txt
something like:
fox dog
And document.txt
something like:
the quick brown fox
jumped over the lazy dog
on a beautiful dog day afternoon, you foxy dog
there is nothing on FOX
and sometimes you're in a foxhole with a dog
It will write output.txt
with:
1: ['fox']
2: ['dog']
3: ['fox', 'dog']
5: ['fox', 'dog']
If you don't want partial matches (like foxhole
) and if you care about the order in which words were found, and perhaps want to know about duplicates as well, and you want to make sure capitalisation doesn't matter:
with open('keywords.txt') as f:
keywords = [k.lower() for k in f.read().split()]
with open('document.txt') as f, open('output.txt', 'w') as o:
for n, line in enumerate(f):
if matches := [w for w in line.split() if w.lower() in keywords]:
o.write(f'{n+1}: {matches}\n')
And finally, perhaps your document.txt gets a 6th line with punctuation:
I watch "FOX", but although I search doggedly, I can't find a thing, you foxy dog!
Then this script:
import re
import string
with open('keywords.txt') as f:
keywords = [k.lower() for k in f.read().split()]
with open('document.txt') as f, open('output.txt', 'w') as o:
for n, line in enumerate(f):
if matches := [w for w in re.sub('['+string.punctuation+']', '', line).split() if w.lower() in keywords]:
o.write(f'{n+1}: {matches}\n')
Gets this written to output.txt
:
1: ['fox']
2: ['dog']
3: ['dog', 'dog']
4: ['FOX']
5: ['dog']
6: ['FOX', 'dog']