I have a function and want to return (a) the number of words per sentence and (b) mean length of words per sentence in a list of tuples. I can get (a). For (b) I can get the total number of characters per sentence but not mean.
I've reviewed a few posts (this, that and another) but can't wrap my head around this last piece.
I've included a couple failed attempts commented out.
import statistics
def sentence_num_and_mean(text):
""" Output list of, per sentence, number of words and mean length of words """
# Replace ! and ? with .
for ch in ['!', '?']:
if ch in text:
text = text.replace(ch, '.')
# Number of words per sentence
num_words_per_sent = [len(element) for element in (element.split() for element in text.split("."))]
# Mean length of words per sentence
# This gets sum of characters per sentence, so on the right track
mean_len_words_per_sent = [len(w) for w in text.split('.')]
# This gives me "TypeError: unsupported operand type(s) for /: 'int' and 'list'" error
# when trying to get the denominator for the mean
# A couple efforts
#mean_len_words_per_sent = sum(num_words_per_sent) / [len(w) for w in text.split('.')]
#mean_len_words_per_sent = [(num_words_per_sent)/statistics.mean([len(w) for w in text.split()])]
# Return list zipped together
return list(zip(num_words_per_sent, mean_len_words_per_sent))
Driver program:
split_test = "First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah"
func_test = sentence_num_and_mean(split_test)
print(split_test)
print(func_test)
which prints
First sentence ends with a period. Next one ends with a question mark? Another period. Then exclamation! Blah blah blah
[(6, 33), (7, 35), (2, 15), (2, 17), (3, 15)]
For one, I need to strip out spaces and periods, but ignoring that for now, if I did the simple math right it should be:
[(6, 5.5), (7, 5), (2, 7.5), (2, 8.5), (3, 5)]