Finding average word length in a string

Question

def word_count (x: str) -> str:
    characters = len(x)
    word = len(x.split())
    average = sum(len(x) for x in word)/len(word)
    print('Characters: ' + str(char) + '\n' + 'Words: ' + str(word) + '\n' + 'Avg word length: ' + str(avg) + '\n')

This code works fine for normal strings, but for a string like:

'***The ?! quick brown cat:  leaps over the sad boy.'

How do I edit the code so that figures like "***" and "?!" aren't accounted for in the code? The average word count of the sentence above should turn out to be 3.888889, but my code is giving me another number.

You'll have to be more precise as to what you want filtered out. But the basic idea would be to remove the rejected "words" from x.split(), and use that reduced list instead. — Scott Hunter, Oct 31 '15 at 01:14
If the issue is removing unwanted characters form certain words, you'd have to spell that out instead/too. — Scott Hunter, Oct 31 '15 at 01:15
Using `re` to filter out what you don't want included would be a relatively simple method of achieving this (i.e. double spaces, special characters, etc) — Demian Brecht, Oct 31 '15 at 01:15
Everything is filtered out for the average calculation except for actual alphabet letters, I believe — Ramon Hallan, Oct 31 '15 at 01:16

score 2 · Answer 1 · answered Oct 31 '15 at 02:57

2

Strings have a .translate() method you can use for this (if you know all characters you want deleted):

>>> "***foo ?! bar".translate(None, "*?!")
'foo  bar'

answered Oct 31 '15 at 02:57

thebjorn

26,297
11
96
138

flamenco · Accepted Answer · 2015-10-31T03:05:08.317

Try this:

import re

def avrg_count(x):
    total_chars = len(re.sub(r'[^a-zA-Z0-9]', '', x))
    num_words = len(re.sub(r'[^a-zA-Z0-9 ]', '', x).split())
    print "Characters:{0}\nWords:{1}\nAverage word length: {2}".format(total_chars, num_words, total_chars/float(num_words))


phrase = '***The ?! quick brown cat:  leaps over the sad boy.'

avrg_count(phrase)

Output:

Characters:34
Words:9
Average word length: 3.77777777778

score 0 · Answer 3 · edited May 23 '17 at 11:58

0

You should be able to trim all non-alphanumeric characters from each word, and then only use the word when the length is still greater than 0. The first solution I found was a regex solution, but you might be able to find other ways to get it done.

Stripping everything but alphanumeric chars from a string in Python

edited May 23 '17 at 11:58

Community

1
1

answered Oct 31 '15 at 01:15

Andrew Shirley

407
2
10

score 0 · Answer 4 · answered Oct 31 '15 at 02:49

0

import re

full_sent = '***The ?! quick brown cat:  leaps over the sad boy.'
alpha_sent = re.findall(r'\w+',full_sent)
print(alpha_sent)

Will output:

['The', 'quick', 'brown', 'cat', 'leaps', 'over', 'the', 'sad', 'boy']

To get average you can do:

average = sum(len(word) for word in alpha_sent)/len(alpha_sent)

Which will give: 3.77

answered Oct 31 '15 at 02:49

Leb

15,483
10
56
75

I'm having trouble incorporating this into my function--do you mind kinda briefly plug it in to my code above? – Ramon Hallan Oct 31 '15 at 03:53
You don't need to incorporate it, if you're talking about the other prints, then `word` will be `len(alpha_sent)` and `char` will be `sum(len(word) for word in alpha_sent)` – Leb Oct 31 '15 at 03:56

Finding average word length in a string

4 Answers4