I am new to Python and need some help with trying to come up with a text content analyzer that will help me find 7 things within a text file:
- Total word count
- Total count of unique words (without case and special characters interfering)
- The number of sentences
- Average words in a sentence
- Find common used phrases (a phrase of 3 or more words used over 3 times)
- A list of words used, in order of descending frequency (without case and special characters interfering)
- The ability to accept input from STDIN, or from a file specified on the command line
So far I have this Python program to print total word count:
with open('/Users/name/Desktop/20words.txt', 'r') as f:
p = f.read()
words = p.split()
wordCount = len(words)
print "The total word count is:", wordCount
So far I have this Python program to print unique words and their frequency: (it's not in order and sees words such as: dog
, dog.
, "dog
, and dog,
as different words)
file=open("/Users/name/Desktop/20words.txt", "r+")
wordcount={}
for word in file.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
for k, v in wordcount.items():
print k, v
Thank you for any help you can give!