Counting Avg Number of Words Per Sentence

Question

I'm having a bit of trouble trying to count the number of words per sentence. For my case, I'm assuming sentences only end with either "!", "?", or "."

I have a list that looks like this:

["Hey, "!", "How", "are", "you", "?", "I", "would", "like", "a", "sandwich", "."]

For the example above, the calculation would be 1 + 3 + 5 / 3. I'm having a hard time achieving this, though! Any ideas?

score 3 · Answer 1 · answered Feb 09 '17 at 18:21

words = ["Hey", "!", "How", "are", "you", "?", "I", "would", "like", "a", "sandwich", "."]

sentences = [[]]
ends = set(".?!")
for word in words:
    if word in ends: sentences.append([])
    else: sentences[-1].append(word)

if sentences[0]:
    if not sentences[-1]: sentences.pop()
    print("average sentence length:", sum(len(s) for s in sentences)/len(sentences))

Fagan · Answer 2 · 2017-02-09T19:05:35.610

3

A simple solution:

mylist = ["Hey", "!", "How", "are", "you", "?", "I", "would", "like", "a", "sandwich", "."]
terminals = set([".", "?", "!"]) # sets are efficient for "membership" tests
terminal_count = 0

for item in mylist:
    if item in terminals: # here is our membership test
        terminal_count += 1

avg = (len(mylist) - terminal_count)  / float(terminal_count)

This assumes you only care about getting the average, not the individual counts per sentence.

If you'd like to get a little fancy, you can replace the for loop with something like this:

terminal_count = sum(1 for item in mylist if item in terminals)

edited Feb 09 '17 at 19:05

answered Feb 09 '17 at 18:35

Fagan

111
4

1

That's pretty clever. It would be a bit better to store the terminals in a `set` before the loop. Or if you think that's overkill, then you could at least write the condition simpler as `if item in ".!?"` – janos Feb 09 '17 at 18:39
@janos Good call on pulling the terminals out into their own constant. I'd prefer a list over a string, for clarity's sake. – Fagan Feb 09 '17 at 18:50
Why a `list` why not a `set`? – janos Feb 09 '17 at 18:51
@jonas You've educated me! I'm reading that sets are much more efficient when finding `x in y`, so I'll update accordingly. – Fagan Feb 09 '17 at 19:00

RomanPerekhrest · Answer 3 · 2017-02-09T18:36:15.047

1

Short solution using re.split() and sum() functions:

import re
s = "Hey ! How are you ? I would like a sandwich ."
parts = [len(l.split()) for l in re.split(r'[?!.]', s) if l.strip()]

print(sum(parts)/len(parts))

The output:

3.0

In case if there could be only a list of words as input:

import re
s = ["Hey", "!", "How", "are", "you", "?", "I", "would", "like", "a", "sandwich", "."]
parts = [len(l.split()) for l in re.split(r'[?!.]', ' '.join(s)) if l.strip()]

print(sum(parts)/len(parts))   # 3.0

edited Feb 09 '17 at 18:36

answered Feb 09 '17 at 18:23

RomanPerekhrest

88,541
4
65
105

The second one worked really well! I like the use of regular expressions, as I'm working on an NLP project of sorts. – natalien Feb 09 '17 at 19:19

Counting Avg Number of Words Per Sentence

3 Answers3

Linked