Hi have been playing with a simple program that reads in text and identify's keywords where the initial letter is capitalised. The issue I am having is that the program will not remove punctuation from words, what I mean by that is, Frodo Frodo. Frodo, are coming up as different entries rather than the same. I tried using import string and playing around with punctuation but it did not work.
Below is my code and the text i used was from http://www.angelfire.com/rings/theroaddownloads/fotr.pdf (copied into a txt document called novel.txt). Thanks Again
by_word = {}
with open ('novel.txt') as f:
for line in f:
for word in line.strip().split():
if word[0].isupper():
if word in by_word:
by_word[word] += 1
else:
by_word[word] = 1
by_count = []
for word in by_word:
by_count.append((by_word[word], word))
by_count.sort()
by_count.reverse()
for count, word in by_count[:100]:
print(count, word)