79

How would I go about counting the words in a sentence? I'm using Python.

For example, I might have the string:

string = "I     am having  a   very  nice  23!@$      day. "

That would be 7 words. I'm having trouble with the random amount of spaces after/before each word as well as when numbers or symbols are involved.

smci
  • 32,567
  • 20
  • 113
  • 146
HossBender
  • 1,019
  • 2
  • 10
  • 23
  • 2
    To accomodate the numbers, you can change the regex. `\w` matches `[a-zA-Z0-9]` Now, you need to define what your use case is. What happens to `I am fine2` ? Would it be 2 words or 3 ? – karthikr Oct 16 '13 at 17:58
  • You needed to explicitly add *"ignoring numbers, punctuation and whitespace"* since that's part of the task. – smci Jul 05 '18 at 19:06
  • FYI some punctuation symbols may merit separate consideration. Otherwise, *"carry-on luggage"* becomes three words, as does *"U.S.A."* So answers may want to parameterize what punctuation is allowed, rather than blanket regex like `\S+` – smci Jul 05 '18 at 19:51

8 Answers8

111

str.split() without any arguments splits on runs of whitespace characters:

>>> s = 'I am having a very nice day.'
>>> 
>>> len(s.split())
7

From the linked documentation:

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.

arshajii
  • 127,459
  • 24
  • 238
  • 287
  • 9
    One (very minor) disadvantage of this would be that you could have punctuation groups counted as words. For example, in `'I am having a very nice day -- or at least I was.'`, you'd get `--` counted as a word. `isalnum` might help, I guess, depending on the OP's definition of "word". – DSM Oct 16 '13 at 17:38
  • This seems to be faster than regex – Alaa M. Jan 06 '18 at 09:49
  • Of course it's faster, but it's also much more limited. – Gabriel Jul 05 '18 at 19:08
  • 3
    Nope, counts punctuation: `'apple & orange'.split()` gives `['apple', '&', 'orange']` – stelios Aug 19 '18 at 12:30
66

You can use regex.findall():

import re
line = " I am having a very nice day."
count = len(re.findall(r'\w+', line))
print (count)
karthikr
  • 97,368
  • 26
  • 197
  • 188
7
s = "I     am having  a   very  nice  23!@$      day. "
sum([i.strip(string.punctuation).isalpha() for i in s.split()])

The statement above will go through each chunk of text and remove punctuations before verifying if the chunk is really string of alphabets.

4b0
  • 21,981
  • 30
  • 95
  • 142
boon kwee
  • 119
  • 1
  • 4
  • 2
    1. Using `i` as a nonindex variable is really misleading; 2. you don't need to create a list, it's just wasting memory. Suggestion: `sum(word.strip(string.punctuation).isalpha() for word in s.split())` – Gabriel Jul 05 '18 at 19:12
5

This is a simple word counter using regex. The script includes a loop which you can terminate it when you're done.

#word counter using regex
import re
while True:
    string =raw_input("Enter the string: ")
    count = len(re.findall("[a-zA-Z_]+", string))
    if line == "Done": #command to terminate the loop
        break
    print (count)
print ("Terminated")
Aliyar
  • 51
  • 1
  • 1
4

Ok here is my version of doing this. I noticed that you want your output to be 7, which means you dont want to count special characters and numbers. So here is regex pattern:

re.findall("[a-zA-Z_]+", string)

Where [a-zA-Z_] means it will match any character beetwen a-z (lowercase) and A-Z (upper case).


About spaces. If you want to remove all extra spaces, just do:

string = string.rstrip().lstrip() # Remove all extra spaces at the start and at the end of the string
while "  " in string: # While  there are 2 spaces beetwen words in our string...
    string = string.replace("  ", " ") # ... replace them by one space!
JadedTuna
  • 1,783
  • 2
  • 18
  • 32
4
    def wordCount(mystring):  
        tempcount = 0  
        count = 1  

        try:  
            for character in mystring:  
                if character == " ":  
                    tempcount +=1  
                    if tempcount ==1:  
                        count +=1  

                    else:  
                        tempcount +=1
                 else:
                     tempcount=0

             return count  

         except Exception:  
             error = "Not a string"  
             return error  

    mystring = "I   am having   a    very nice 23!@$      day."           

    print(wordCount(mystring))  

output is 8

Community
  • 1
  • 1
Darrell White
  • 109
  • 1
  • 5
  • Was looking for a solution without use of builtin functions like strip, split, etc. But this code fails with leading/trailing whitespaces. – Rama Jun 05 '20 at 12:10
  • This is the best answer since it does not use any standard library. – saviour123 Mar 12 '21 at 21:05
3

How about using a simple loop to count the occurrences of number of spaces!?

txt = "Just an example here move along" 
count = 1
for i in txt:
if i == " ":
   count += 1
print(count)
Eshan Chattaraj
  • 368
  • 1
  • 6
  • 19
Anto
  • 3,128
  • 1
  • 20
  • 20
0
import string 

sentence = "I     am having  a   very  nice  23!@$      day. "
# Remove all punctuations
sentence = sentence.translate(str.maketrans('', '', string.punctuation))
# Remove all numbers"
sentence = ''.join([word for word in sentence if not word.isdigit()])
count = 0;
for index in range(len(sentence)-1) :
    if sentence[index+1].isspace() and not sentence[index].isspace():
        count += 1 
print(count)
Adam
  • 2,726
  • 1
  • 9
  • 22