1

I have made a text string and removed all non alphabetical symbols and added whitespaces in between the words, but when I add them to a dictionary to count the frequency of the words it counts the letters instead. How do I count the words from a dictionary?

dictionary = {}  

for item in text_string:
    if item in dictionary:
        dictionary[item] = dictionary[item]+1
    else:
        dictionary[item] = 1
print(dictionary)
Aran-Fey
  • 39,665
  • 11
  • 104
  • 149
  • 4
    How is python supposed to know what a "word" is? If you want to iterate over words, you have to split the string into words first. – Aran-Fey Apr 26 '18 at 09:31
  • Just print stuff to see what you are working on. Do not suppose it should work. – Christophe Roussy Apr 26 '18 at 09:35
  • The iterator on built-in `str` yields each character one by one, explaining what you are getting. To iterate on words, use `split` like others have mentionned in their answers ! – Valentin B. Apr 26 '18 at 09:41

2 Answers2

3

Change this

for item in text_string:

to this

for item in text_string.split():

Function .split() splits the string to words using whitespace characters (including tabs and newlines) as delimiters.

Ivan Vinogradov
  • 4,269
  • 6
  • 29
  • 39
1

You are very close. Since you state that your words are already whitespace separated, you need to use str.split to make a list of words.

An example is below:

dictionary = {}  

text_string = 'there are repeated words in this sring with many words many are repeated'

for item in text_string.split():
    if item in dictionary:
        dictionary[item] = dictionary[item]+1
    else:
        dictionary[item] = 1

print(dictionary)

{'there': 1, 'are': 2, 'repeated': 2, 'words': 2, 'in': 1,
 'this': 1, 'sring': 1, 'with': 1, 'many': 2}

Another solution is to use collections.Counter, available in the standard library:

from collections import Counter

text_string = 'there are repeated words in this sring with many words many are repeated'

c = Counter(text_string.split())

print(c)

Counter({'are': 2, 'repeated': 2, 'words': 2, 'many': 2, 'there': 1,
         'in': 1, 'this': 1, 'sring': 1, 'with': 1})
jpp
  • 159,742
  • 34
  • 281
  • 339