I want to analyze a text file that holds a short story. Now I want to analyze it to make different types of graphs. I found plenty of ways to read a text file holding data but not actual words. Now I know I can do something like this:
f = open('short_story.txt')
for line in f:
for word in line.split():
To count the words in the file. But is that the appropriate way to do it when I am using numpy and matplotlib. If anyone could explain how to use a text file of words, not data numbers, that would be great.
************
"Radio for warships, eh?" he muttered. A wireless transmitter was one of many modern innovations that the Virginia did not boast. She had been gathering copra and shell among the islands long before such things came into common use, though Dan had invested his modest savings in her only a year before.
"What would anyone want with warships on Davis Island?" The name roused a vague memory. "Davis Island?" he repeated, staring in concentration at the black sea. "Of course!" It came to him suddenly. A newspaper article that he had read five years before, at about the time he had abandoned college in the middle of his junior year, to follow the call of adventure.
The account had dealt with an eclipse of the sun, visible only from certain points on the Pacific. One Dr. Hunter, under the auspices of a Western university, had sailed with his instruments and assistants to Davis Island, to study the solar corona during the few precious moments when the shadow covered the sun, and to observe the displacement of certain stars as a test of Einstein's theory of relativity.`
f = open('story.txt','r')
words = [x for y in [l.split() for l in f.readlines()] for x in y]
print sorted([(w, words.count(w)) for w in set(words)], key = lambda x:x[1], reverse=True)[:5]
Found top five words. now i want to plot it in something like a bar graph, I got these top five words...
[('the', 4826), ('of', 2276), ('and', 1825), ('a', 1761), ('to', 1693)]