-1

I have been practicing python for the first time and I encountered this doubt. With a variable text, I inputed a small paragraph and split it based on spaces. So now I have words of that paragraph, but this is stored in a dictionary. Next I went on to find the number occurences of each word in the paragraph. My ultimate motive is to make a new list of words that appear more than 'x' number of times.

My code is:

text = '''Population refers to the number of individuals in a particular 
place. It could be the number of humans or any other life form living in a 
certain specified area. The number of tigers living in a forest is 
therefore referred to as the population of tigers in the forest. The 
number of people living in a town or city or an entire country is the 
human population in that particular area.'''

words = text.split(" ")
a = dict()
for word in words:
  if word not in a:
    a[word] = 1
  else:
    a[word]+= 1

newlist = list()
val = 7
for key,value in a.items():
  if a[key]>val:
    newlist.append(i)

The final output that I receive after executing the last line is:

['years.', 'years.', 'years.', 'years.']

I don't know where I am going wrong

1 Answers1

1

In order to create a dict with words as keys and number of occurences as values, you need to get all unique words first. You can do that by using the set function of Python.

Then, you iterate over that set and by using the count method of list, you can get the number of occurences for each word.

You can see that below:

text = '''Population refers to the number of individuals in a particular 
place. It could be the number of humans or any other life form living in a 
certain specified area. The number of tigers living in a forest is 
therefore referred to as the population of tigers in the forest. The 
number of people living in a town or city or an entire country is the 
human population in that particular area.'''

words = text.split() # Split text and create a list of all words
wordset = set(words) # Get all unique words
wordDict = dict((word,words.count(word)) for word in wordset) # Create dictionary of words and number of occurences.

for key, value in wordDict.items():
    print(key + ' : ' + str(value))

This will give you:

individuals : 1
forest : 1
the : 5
could : 1
therefore : 1
place. : 1
form : 1
or : 3
country : 1
population : 2
humans : 1
The : 2
city : 1
living : 3
Population : 1
life : 1
in : 6
a : 4
refers : 1
tigers : 2
is : 2
to : 2
be : 1
an : 1
other : 1
as : 1
particular : 2
number : 4
human : 1
It : 1
any : 1
forest. : 1
town : 1
that : 1
certain : 1
of : 5
entire : 1
people : 1
specified : 1
referred : 1
area. : 2

Then you can apply your own filters to get all words that appear more than x times.

Vasilis G.
  • 7,556
  • 4
  • 19
  • 29
  • I ran by the documentation for set, and I am still confused with it's significance. Compared to my code, why is set different and why does it work and my code does not? – david smiths Jun 14 '19 at 21:08
  • `set` takes a list and removes all duplicate elements, keeping every element only one time. Running your code, produces an empty list and that is because no element has occured more than `7` times. A small remark, since you're iterating using `items()`, you can directly use the value of the item currently iterated and change your condition to `if value>val`. – Vasilis G. Jun 15 '19 at 07:46
  • Thank you for explaining that. Also, the section of my code, where I declare an empty dictionary "a", uptil a[word]+=1, shouldn't it do the same of what set does? – david smiths Jun 15 '19 at 09:46
  • Yes, it would do exactly the same as set does, you just have to check for each item if it's already in the dictiorary or not. – Vasilis G. Jun 15 '19 at 13:35
  • Thank you. Also, the piece of code you wrote where you are creating a dictionary is somewhat complex and new to me. As to how I understand it is like, for every word in the wordset, use the word, count it's size and add them to the dictionary. Is this correct from what I have understood or it means something else? – david smiths Jun 15 '19 at 15:12
  • Yes, this is exactly what it does. If this line of code confuses you, take a look at [list comprehensions](https://www.python.org/dev/peps/pep-0202/#id6). – Vasilis G. Jun 15 '19 at 16:08