Python word count (2 files that contains words) (1 file for word count) ( last file to write inside him word+count)

Question

2 txt files that contain words (like lyrics for example)

1 txt file that contain the words I want to count in those 2 files^

1 txt file that will contain the words + count

file1 = open(r'E:\Users\OneDrive\Desktop\python\file1.txt','r')
file2 = open(r'E:\Users\OneDrive\Desktop\python\file2.txt','r')
file3 = open(r'E:\Users\OneDrive\Desktop\python\words.txt','r')
file4 = open(r'E:\Users\OneDrive\Desktop\python\wordsInFiles.txt','w')

for word in file3:
    word = word.strip("\n")
    counter = 0
    counter2 = 0
    for line in file1:
        line = line.strip("\n")
        words = line.split()
        for w in words:
            w = w.strip()
            if(w == word):
                counter += 1
    file1.seek(0,0)
    for line in file2:
        line = line.strip("\n")
        words = line.split()
        for w in words:
            w = w.strip()
            if(w == word):
                counter2 += 1
    file4.write(word + " " + str(counter) + "\n")
    file4.write(word + " " + str(counter2) + "\n")
    file2.seek(0,0)

file1.close()
file2.close()
file3.close()
file4.close()

It duplicates the words for me, also counting is incorrect.

thanks for whoever help

Your code seems to be reading files 1 & 2 for each word in file3. Unless file3 only has a couple of words this seems very inefficient. — DarrylG, Jan 25 '20 at 15:37
Your counter is inccorect because `words = line.split(); for w in words:` words is words on a line, but w are the letters in a word. — DarrylG, Jan 25 '20 at 15:40
@DarrylG thanks for your response, I am sorry for being ignorant i am kind of a beginner, I didn't quiet understand whats wrong with the counter and this for, also in the file 3(words) i have around 8 words. — idan half, Jan 25 '20 at 16:37
@idanhalf--no problem. Take a look at my answer where I attempted to solve your major issues. — DarrylG, Jan 25 '20 at 16:44
don't do `file.open()` and `file.close()` use `filename.open()` in a `with` block so you don't have to think about calling `.close` if there's an error in your program and stuff like that. — Boris Verkhovskiy, Jan 25 '20 at 17:03

DarrylG · Answer 1 · 2020-01-25T17:46:43.930

The OP code has the following issues.

(1) It checking letters rather than words with the following code lines:

for line in file2:
        line = line.strip("\n")
        words = line.split()
        for w in words:
            w = w.strip()
            if(w == word):

(2) Loops through the file1 & file2 for each word (very inefficient).

Code Refactored due to the above issues

from collections import Counter
from ordered_set import OrderedSet
import string

# Utility Functions
def string_to_words(s):
  " Convert string to lower case words without puntuation "
  # Remove punctuation, lower case and split on space
  # Using remove punctuation code from https://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string
  return s.translate(str.maketrans('', '', string.punctuation)).lower().split()

def update_count(s, valid_words, cnt = None):
  " count words in string "
  if s:
    if cnt == None:
      cnt = Counter()
    # Use generator (rather than list comprehension) to update counter i.e. https://wiki.python.org/moin/Generators
    cnt.update(word for word in string_to_words(s) if word in include_words)
  return cnt

if __name__ == "__main__":
  # Main Code Body
  with open(r'words.txt','r') as file3:
  # Get set of cords that we want to include
  # Use set since this is much quicker than a list to see if the is in words
  # Use OrderedSet (rather than set) since this preserves the order of items
  # added which allows outputting in the same order as words in file words.txt
    include_words = OrderedSet()
    for line in file3:
      include_words.update(string_to_words(line.rstrip()))

  with open(r'file1.txt','r') as file1:
    cnt1 = None
    for line in file1:
      cnt1 = update_count(line.rstrip(), include_words, cnt1)

  with open(r'file2.txt','r') as file2:
    cnt2 = None
    for line in file2:
      cnt2 = update_count(line.rstrip(), include_words, cnt2)

  with open(r'wordsInFiles.txt','w') as file4:
    for word in include_words:
      file4.write(f'{word} {cnt1[word]}\n')
      file4.write(f'{word} {cnt2[word]}\n')

Example Usage

file1.txt

There are five known copies of the speech in Lincoln's handwriting, each with a slightly different text, and named for the people who first received them: Nicolay, Hay, Everett, Bancroft and Bliss. Two copies apparently were written before delivering the speech, one of which probably was the reading copy.

file2.txt

When shall we three meet again In thunder, lightning, or in rain?
When the hurlyburly's done,
When the battle's lost and won.

That will be ere the set of sun.

words.txt (allows multiple words per line, ignores blank lines and punctuation)

There are 
five known copies

When the 
hurlyburly's done
When the battle's lost and won

wordsinfile.txt

there 1
there 0
are 1
are 0
five 1
five 0
known 1
known 0
copies 2
copies 0
when 0
when 3
the 4
the 3
hurlyburlys 0
hurlyburlys 1
done 0
done 1
battles 0
battles 1
lost 0
lost 1
and 2
and 1
won 0
won 1

I don't think you need to pass a list to Counter, a generator expression is fine, you can just remove the `[]`. You should mention that OrderedSet is not in the Python standard library. — Boris Verkhovskiy, Jan 25 '20 at 17:17
@Bons--Agree. I just used a list to reduce the number of things for the newbie to learn in understanding the refactoring. — DarrylG, Jan 25 '20 at 17:33
@bons--switched to a generator with a pointer to an explanation for OP. — DarrylG, Jan 25 '20 at 17:40

Boris Verkhovskiy · Answer 2 · 2020-01-28T00:20:03.627

1) Count all the words in all the files

2) Look at the file containing the words you're interested in and look up each word in the Counter object you got from step 1

from collections import Counter

input_filenames = [
    r"E:\Users\OneDrive\Desktop\python\file1.txt",
    r"E:\Users\OneDrive\Desktop\python\file2.txt",
]
file_with_words_youre_interested_in = r"E:\Users\OneDrive\Desktop\python\file3.txt"
output_filename = r"E:\Users\OneDrive\Desktop\python\wordsInFiles.txt"


# A generator that yields all the words in a file one by one
def get_words(filename):
    with open(filename) as f:
        for line in f:
            yield from line.split()


filename_to_word_count = {
    filename: Counter(get_words(filename)) for filename in input_filenames
}

with open(file_with_words_youre_interested_in) as f:
    words_to_count = f.read().splitlines()

with open(output_filename, "w") as f:
    for word_to_count in words_to_count:
        for filename in input_filenames:
            f.write(f"{word_to_count} {filename_to_word_count[filename][word_to_count]}\n")

`line.split()` gives the same result as `line.strip().split()`. — Steven Rumbalski, Jan 27 '20 at 16:45

Python word count (2 files that contains words) (1 file for word count) ( last file to write inside him word+count)

2 Answers2