4
def word_count (x: str) -> str:
    characters = len(x)
    word = len(x.split())
    average = sum(len(x) for x in word)/len(word)
    print('Characters: ' + str(char) + '\n' + 'Words: ' + str(word) + '\n' + 'Avg word length: ' + str(avg) + '\n')

This code works fine for normal strings, but for a string like:

'***The ?! quick brown cat:  leaps over the sad boy.'

How do I edit the code so that figures like "***" and "?!" aren't accounted for in the code? The average word count of the sentence above should turn out to be 3.888889, but my code is giving me another number.

Ramon Hallan
  • 125
  • 12
  • You'll have to be more precise as to what you want filtered out. But the basic idea would be to remove the rejected "words" from x.split(), and use that reduced list instead. – Scott Hunter Oct 31 '15 at 01:14
  • If the issue is removing unwanted characters form certain words, you'd have to spell that out instead/too. – Scott Hunter Oct 31 '15 at 01:15
  • Using `re` to filter out what you don't want included would be a relatively simple method of achieving this (i.e. double spaces, special characters, etc) – Demian Brecht Oct 31 '15 at 01:15
  • Everything is filtered out for the average calculation except for actual alphabet letters, I believe – Ramon Hallan Oct 31 '15 at 01:16

4 Answers4

2

Strings have a .translate() method you can use for this (if you know all characters you want deleted):

>>> "***foo ?! bar".translate(None, "*?!")
'foo  bar'
thebjorn
  • 26,297
  • 11
  • 96
  • 138
1

Try this:

import re

def avrg_count(x):
    total_chars = len(re.sub(r'[^a-zA-Z0-9]', '', x))
    num_words = len(re.sub(r'[^a-zA-Z0-9 ]', '', x).split())
    print "Characters:{0}\nWords:{1}\nAverage word length: {2}".format(total_chars, num_words, total_chars/float(num_words))


phrase = '***The ?! quick brown cat:  leaps over the sad boy.'

avrg_count(phrase)

Output:

Characters:34
Words:9
Average word length: 3.77777777778
flamenco
  • 2,702
  • 5
  • 30
  • 46
0

You should be able to trim all non-alphanumeric characters from each word, and then only use the word when the length is still greater than 0. The first solution I found was a regex solution, but you might be able to find other ways to get it done.

Stripping everything but alphanumeric chars from a string in Python

Community
  • 1
  • 1
Andrew Shirley
  • 407
  • 2
  • 10
0
import re

full_sent = '***The ?! quick brown cat:  leaps over the sad boy.'
alpha_sent = re.findall(r'\w+',full_sent)
print(alpha_sent)

Will output:

['The', 'quick', 'brown', 'cat', 'leaps', 'over', 'the', 'sad', 'boy']

To get average you can do:

average = sum(len(word) for word in alpha_sent)/len(alpha_sent)

Which will give: 3.77

Leb
  • 15,483
  • 10
  • 56
  • 75
  • I'm having trouble incorporating this into my function--do you mind kinda briefly plug it in to my code above? – Ramon Hallan Oct 31 '15 at 03:53
  • You don't need to incorporate it, if you're talking about the other prints, then `word` will be `len(alpha_sent)` and `char` will be `sum(len(word) for word in alpha_sent)` – Leb Oct 31 '15 at 03:56