Trying to count words in a string

Question

I'm trying to analyze the contents of a string. If it has a punctuation mixed in the word I want to replace them with spaces.

For example, If Johnny.Appleseed!is:a*good&farmer is entered as an input then it should say there are 6 words, but my code only sees it as 0 words. I'm not sure how to remove an incorrect character.

FYI: I'm using python 3, also I can't import any libraries

string = input("type something")
stringss = string.split()

    for c in range(len(stringss)):
        for d in stringss[c]:
            if(stringss[c][d].isalnum != True):
                #something that removes stringss[c][d]
                total+=1
print("words: "+ str(total))

You are over-complicating this. You can iterate a string using a normal for loop. — squiguy, Jul 06 '13 at 23:06
`d` is an individual character of a string, *not* and index. And you are not calling the `.isalnum()` method, just referencing it. And use `if not` to test for negative, not `!= True`. — Martijn Pieters, Jul 06 '13 at 23:07
@HarryHarry It's not Pythonic. And just because you are using Python 3, does not mean you cannot import any libraries. If that were true, Python 3 would probably not have been released. — Rushy Panchal, Jul 06 '13 at 23:36

Ashwini Chaudhary · Accepted Answer · 2013-07-07T00:03:20.350

15

Simple loop based solution:

strs = "Johnny.Appleseed!is:a*good&farmer"
lis = []
for c in strs:
    if c.isalnum() or c.isspace():
        lis.append(c)
    else:
        lis.append(' ')

new_strs = "".join(lis)
print new_strs           #print 'Johnny Appleseed is a good farmer'
new_strs.split()         #prints ['Johnny', 'Appleseed', 'is', 'a', 'good', 'farmer']

Better solution:

Using regex:

>>> import re
>>> from string import punctuation
>>> strs = "Johnny.Appleseed!is:a*good&farmer"
>>> r = re.compile(r'[{}]'.format(punctuation))
>>> new_strs = r.sub(' ',strs)
>>> len(new_strs.split())
6
#using `re.split`:
>>> strs = "Johnny.Appleseed!is:a*good&farmer"
>>> re.split(r'[^0-9A-Za-z]+',strs)
['Johnny', 'Appleseed', 'is', 'a', 'good', 'farmer']

edited Jul 07 '13 at 00:03

answered Jul 06 '13 at 23:09

Ashwini Chaudhary

244,495
58
464
504

1

How's regex a better solution, is it faster? – Markus Meskanen Jul 06 '13 at 23:31
@IgnacioVazquez-Abrams You're right about that(`re.sub` and then `str.split` eh!!), I guess `re.split` is a better alternative. – Ashwini Chaudhary Jul 07 '13 at 00:05
10

`>>> len(re.findall(r'\b\w+\b', 'Johnny.Appleseed!is:a*good&farmer'))` `6` – Ignacio Vazquez-Abrams Jul 07 '13 at 00:05
If you're going to use `re.split`, then I'd go for `re.split('[\W]+', strs)`... but I'd rather the more direct `re.findall` as shown by @IgnacioVazquez-Abrams – Jon Clements Jul 07 '13 at 00:16
@JonClements I think it should be `'[\W_]+'`? Anyways thanks for the useful tip. :) I should work hard on regexes. – Ashwini Chaudhary Jul 07 '13 at 00:22

score 11 · Answer 2 · edited May 23 '17 at 12:16

Here's a one-line solution that doesn't require importing any libraries.
It replaces non-alphanumeric characters (like punctuation) with spaces, and then splits the string.

Inspired from "Python strings split with multiple separators"

>>> s = 'Johnny.Appleseed!is:a*good&farmer'
>>> words = ''.join(c if c.isalnum() else ' ' for c in s).split()
>>> words
['Johnny', 'Appleseed', 'is', 'a', 'good', 'farmer']
>>> len(words)
6

score 3 · Answer 3 · answered Dec 01 '13 at 19:14

3

try this: it parses the word_list using re, then creates a dictionary of word:appearances

import re
word_list = re.findall(r"[\w']+", string)
print {word:word_list.count(word) for word in word_list}

answered Dec 01 '13 at 19:14

Dotan

6,602
10
34
47

score 3 · Answer 4 · answered Jul 09 '15 at 20:25

3

How about using Counter from collections ?

import re
from collections import Counter

words = re.findall(r'\w+', string)
print (Counter(words))

answered Jul 09 '15 at 20:25

sweet_sugar

1,390
3
13
22

score 1 · Answer 5 · answered Jul 06 '13 at 23:08

1

for ltr in ('!', '.', ...) # insert rest of punctuation
     stringss = strings.replace(ltr, ' ')
return len(stringss.split(' '))

answered Jul 06 '13 at 23:08

Rushy Panchal

16,979
16
61
94

score 1 · Answer 6 · answered Jun 05 '14 at 01:11

I know that this is an old question but...How about this?

string = "If Johnny.Appleseed!is:a*good&farmer"

a = ["*",":",".","!",",","&"," "]
new_string = ""

for i in string:
   if i not in a:
      new_string += i
   else:
      new_string = new_string  + " "

print(len(new_string.split(" ")))

score 0 · Answer 7 · answered Jul 26 '18 at 13:18

0

#Write a python script to count words in a given string.
 s=str(input("Enter a string: "))
 words=s.split()
 count=0
  for word in words:
      count+=1

  print(f"total number of words in the string is : {count}")

answered Jul 26 '18 at 13:18

alien ware

1
1

Trying to count words in a string

7 Answers7

Simple loop based solution:

Better solution:

Linked