The OP code has the following issues.
(1) It checking letters rather than words with the following code lines:
for line in file2:
line = line.strip("\n")
words = line.split()
for w in words:
w = w.strip()
if(w == word):
(2) Loops through the file1 & file2 for each word (very inefficient).
Code Refactored due to the above issues
from collections import Counter
from ordered_set import OrderedSet
import string
# Utility Functions
def string_to_words(s):
" Convert string to lower case words without puntuation "
# Remove punctuation, lower case and split on space
# Using remove punctuation code from https://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string
return s.translate(str.maketrans('', '', string.punctuation)).lower().split()
def update_count(s, valid_words, cnt = None):
" count words in string "
if s:
if cnt == None:
cnt = Counter()
# Use generator (rather than list comprehension) to update counter i.e. https://wiki.python.org/moin/Generators
cnt.update(word for word in string_to_words(s) if word in include_words)
return cnt
if __name__ == "__main__":
# Main Code Body
with open(r'words.txt','r') as file3:
# Get set of cords that we want to include
# Use set since this is much quicker than a list to see if the is in words
# Use OrderedSet (rather than set) since this preserves the order of items
# added which allows outputting in the same order as words in file words.txt
include_words = OrderedSet()
for line in file3:
include_words.update(string_to_words(line.rstrip()))
with open(r'file1.txt','r') as file1:
cnt1 = None
for line in file1:
cnt1 = update_count(line.rstrip(), include_words, cnt1)
with open(r'file2.txt','r') as file2:
cnt2 = None
for line in file2:
cnt2 = update_count(line.rstrip(), include_words, cnt2)
with open(r'wordsInFiles.txt','w') as file4:
for word in include_words:
file4.write(f'{word} {cnt1[word]}\n')
file4.write(f'{word} {cnt2[word]}\n')
Example Usage
file1.txt
There are five known copies of the speech in Lincoln's handwriting,
each with a slightly different text, and named for the people who
first received them: Nicolay, Hay, Everett, Bancroft and Bliss. Two
copies apparently were written before delivering the speech, one of
which probably was the reading copy.
file2.txt
When shall we three meet again
In thunder, lightning, or in rain?
When the hurlyburly's done,
When the battle's lost and won.
That will be ere the set of sun.
words.txt (allows multiple words per line, ignores blank lines and punctuation)
There are
five known copies
When the
hurlyburly's done
When the battle's lost and won
wordsinfile.txt
there 1
there 0
are 1
are 0
five 1
five 0
known 1
known 0
copies 2
copies 0
when 0
when 3
the 4
the 3
hurlyburlys 0
hurlyburlys 1
done 0
done 1
battles 0
battles 1
lost 0
lost 1
and 2
and 1
won 0
won 1