I am new to coding and I am trying to create a dictionary from a large body of text and would also like the most frequent word to be shown?
For example, if I had a block of text such as:
text = '''George Gordon Noel Byron was born, with a clubbed right foot, in London on January 22, 1788. He was the son of Catherine Gordon of Gight, an impoverished Scots heiress, and Captain John (“Mad Jack”) Byron, a fortune-hunting widower with a daughter, Augusta. The profligate captain squandered his wife’s inheritance, was absent for the birth of his only son, and eventually decamped for France as an exile from English creditors, where he died in 1791 at 36.'''
I know the steps I would like the code to take. I want words that are the same but capitalised to be counted together so Hi and hi would count as Hi = 2.
I am trying to get the code to loop through the text and create a dictionary showing how many times each word appears. My final goal is to them have the code state which word appears most frequently.
I don't know how to approach such a large amount of text, the examples I have seen are for a much smaller amount of words.
I have tried to remove white space and also create a loop but I am stuck and unsure if I am going the right way about coding this problem.
a.replace(" ", "")
#this gave built-in method replace of str object at 0x000001A49AD8DAE0>, I have now idea what this means!
print(a.replace) # this is what I tried to write to remove white spaces
I am unsure of how to create the dictionary.
To count the word frequency would I do something like:
frequency = {}
for value in my_dict.values() :
if value in frequency :
frequency[value] = frequency[value] + 1
else :
frequency[value] = 1
What I was expecting to get was a dictionary that lists each word shown with a numerical value showing how often it appears in the text.
Then I wanted to have the code show the word that occurs the most.