-1

I want to count the number of times each word is found in the text file and am not sure what is wrong. When I run it I get the count as 0. I was also having trouble finding a way to include in the count the occurrences where the word is capitalized as well (count both dog and Dog occurrences)

def main():
text_file = open("textfile.txt", "r")

dog_count = 0
cat_count = 0

for word in text_file.readlines():
    if word == 'dog':
        dog_count= dog_count + 1
    else:
        dog_count= dog_count

print('the word dog occurs',dog_count,'times')
Chris Martin
  • 30,334
  • 10
  • 78
  • 137
jrhall
  • 1
  • 1
  • 2
  • Possible duplicate of [Efficiently count word frequencies in python](http://stackoverflow.com/questions/35857519/efficiently-count-word-frequencies-in-python) – Julien Mar 22 '17 at 03:56
  • 1
    your iterating over lines instead of words. – Julien Mar 22 '17 at 03:56

3 Answers3

0

I believe your problem is that you are looping of the lines of the file and not the words. You need to add in another loop to go through each word.

Warning: the example below is un-tested but should be close enough.

def main():
    text_file = open("textfile.txt", "r")

    dog_count = 0
    cat_count = 0

    for line in text_file.readlines():
        for word in line.split():
            if word == 'dog':
                dog_count= dog_count + 1

    print('the word dog occurs',dog_count,'times')
pgreen2
  • 3,601
  • 3
  • 32
  • 59
  • You are correct, it will not do that. I would point you to https://stackoverflow.com/questions/2779453/python-strip-everything-but-spaces-and-alphanumeric for examples of stripping out the punctuation. – pgreen2 Mar 22 '17 at 04:00
0

You can make the text into upper/lower cases during your search:

def main(): text_file = open("textfile.txt", "r")

dog_count = 0
cat_count = 0

for line in text_file.readlines():
    for word in line.split():
        word = word.lower() #case convertion
        if word == 'dog':
            dog_count= dog_count + 1

print "The word dog occurs",dog_count,"times"

main()

It should work fine, tested and working fine for me. :)

Nooty
  • 59
  • 1
  • 4
0

Answer: With respect the question of 'why wrong output' - You need to iterate through every word in your line.

Suggestion: When you are search for multiple words, you can have them in a dict and store the count as the value of the corresponding dict key.

Content of file:

Hi this is hello
Hello is my name

Then

text_file.read()

will give,

['Hi this is hello\n', 'Hello is my name\n']

text_file.read().splitlines()
['Hi this is hello', 'Hello is my name']

Then split every line in your lines,

lines = map(str.split,text_file.read().splitlines())
[['Hi', 'this', 'is', 'hello'], ['Hello', 'is', 'my', 'name']]

On chaining the iterable,

it.chain.from_iterable(map(str.split,text_file.read().splitlines()))
['Hi', 'this', 'is', 'hello', 'Hello', 'is', 'my', 'name']

And,

search=['dog','cat'] # the words that you need count
search = dict.fromkeys(search,0) # will give a dict as {'dog':0,'cat':0}

Therefore for your problem,

def main():
        text_file =  open("textfile.txt", "r")
        search=['cat','dog']
        search = dict.fromkeys(search,0)
        import itertools as it
        res=dict()
        for word in it.chain.from_iterable(map(str.split,text_file.read().splitlines())):
                if word.lower() in search:
                        search[word.lower()]=search[word.lower()]+1
        for word,count in search.iteritems():
                print('the word %s occurs %d times'%(word,count))

This get the count of case sensitive words too!

Hope it helps!

Keerthana Prabhakaran
  • 3,766
  • 1
  • 13
  • 23