How to count word frequencies from an input file?

Question

I'm trying to have my program read a single line formed by words separated by commas. For example if we have:

hello,cat,man,hey,dog,boy,Hello,man,cat,woman,dog,Cat,hey,boy

in the input file, the program would need to separate each word on a single line and ditch the commas. After that the program would count frequencies of the words in the input file.

f = open('input1.csv')  # create file object
userInput = f.read()
seperated = userInput.split(',')
for word in seperated:
freq = seperated.count(word)
print(word, freq)

The problem with this code is it prints the initial count for the same word that's counted twice. The output for this program would be:

hello 1
cat 2
man 2
hey 2
dog 2
boy 1
Hello 1
man 2
cat 2
woman 1
dog 2
Cat 1
hey 2
boy
1

The correct output would be:

hello 1
cat 2
man 2
hey 2
dog 2
boy 2
Hello 1
woman 1
Cat 1

Question is how do I make my output look more polished by having the final count instead of the initial one?

look into storing the intermediate counts in a datastructure ( dictionary is nice). — Christian Sloper, Dec 10 '20 at 17:12
Does this answer your question? [How can I count the occurrences of a list item?](https://stackoverflow.com/questions/2600191/how-can-i-count-the-occurrences-of-a-list-item) (Second answer) — Brian McCutchon, Dec 10 '20 at 17:34

J Bernardi · Answer 1 · 2020-12-10T18:00:17.663

This is a common pattern and core programming skill. You should try collecting and counting words each time you encounter them, in a dictionary. I'll give you the idea, but it's best you practise the exact implementation yourself. Happy hacking!

(Also recommend the "pretty print" python built-in method)

import pprint
for word in file:
    word_dict[word] += 1
pprint.pprint(word_dict)

A couple of extra tips - you may want to f.close() your file when you're finished, (E: I misread so disregard the rest...) and it looks like you want to look at converting your words to lower case so that different capitalisations aren't counted seperately. There are python built in methods to do this you can find by searching

Converting to lower case is not important here since the correct output requires both 'hello' and 'Hello'. Thanks for the suggestions, though. — zhu, Dec 10 '20 at 17:55

score 0 · Answer 2 · answered Dec 10 '20 at 17:22

0

try using a dictionary:

f = open('input1.csv')  # create file object
userInput = f.read()
seperated = userInput.split(',')
wordsDict = {}
for word in seperated:
    if word not in wordsDict:
        wordsDict[word] = 1
    else:
        wordsDict[word] = int(wordsDict.get(word)) + 1
for i in wordsDict:
    print i, wordsDict[i]

)

answered Dec 10 '20 at 17:22

Sergio García

486
5
15

I see what you're trying to do here but it printed out the same output as my original code. It didn't solve the problem. Thanks for suggesting, though. – zhu Dec 10 '20 at 17:56

Amol Manthalkar · Answer 3 · 2020-12-10T17:31:05.373

0

Create a new dictionary. Add the word as key and the count of that as value to it

count_dict={}
for w in seperated:
    count_dict[w]=seperated.count(w)
for key,value in count_dict.items():
    print(key,value)

edited Dec 10 '20 at 17:31

answered Dec 10 '20 at 17:24

Amol Manthalkar

1,890
2
16
16

I see how you implemented it in a dictionary but it printed out the same output as my original code. Thanks for the suggestion, though. – zhu Dec 10 '20 at 17:56

How to count word frequencies from an input file?

3 Answers3