1

This is a follow up question to How to count characters in a string? and to Find out how many times a regex matches in a string in Python

I want to count all alphabet characters in the string:

'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...'

The str.count() method allows for counting a specific letter. How would one do that for counting any letter in the entire alphabet in a string, using the count method?

I am trying to use a regex inside the count method, but it returns 0 instead of 83. The code I am using is:

import re

spam_data['text'][0].count((r'[a-zA-Z]'))

When I use:

len(re.findall((r'[a-zA-Z]'), spam_data['text'][0])) it returns a length of 83.

Why does count return a 0 here?

ZakS
  • 1,073
  • 3
  • 15
  • 27

4 Answers4

2

You should use str.count instead of count.

spam_data['text'].str.count('\w')

0    83
Name: text, dtype: int64

To access the first value use:

spam_data['text'].str.count('\w')[0]
83
Abhi
  • 4,068
  • 1
  • 16
  • 29
  • Do you know why .`str.count('\w')` works for spam_data['text'].str.count('\w) (i.e.) a dataframe column, but not for an indexed Series created from spam_data['text']? – ZakS Oct 27 '18 at 18:28
  • It's not clear what you meant here. Maybe an example code to state the issue? – Abhi Oct 28 '18 at 06:02
  • Hi @Abhi, if it's possible to look here I'd be grateful! https://stackoverflow.com/questions/53026049/when-does-str-count-w-work-and-when-doesnt-it?noredirect=1#comment92953894_53026049 – ZakS Oct 28 '18 at 10:33
2

How would one do that for counting any letter in the entire alphabet in a string, using the count method?

wrd = 'Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...'
>>>> count = sum([''.join({_ for _ in wrd if _.isalpha()}).count(w) for w in wrd])
>>>> count
83

explanation: get the sum of unique letters count (inside a set) in the wrd using list comprehension.
similar to:

count = []
set_w = set()
for w in wrd:
    if w.isalpha():
        set_w.add(w)

for w in set_w:
    count.append(wrd.count(w))

print(sum(count))
deadvoid
  • 1,270
  • 10
  • 19
1

Short answer: you did not use a regex, but a raw string literal, and thus count occurrences of the string '[a-zA-Z].

Because a string of the format r'..' is not a regex, it is a raw string literal. If you write r'\n', you write a string with two characters: a backslash and an n. not a new line. Raw strings are useful in the context of regexes, because regexes use a lot of escaping as well.

For example:

>>> r'\n'
'\\n'
>>> type(r'\n')
<class 'str'>

But here you thus count the number of times the string '[a-zA-Z]' occurs, and unless your spam_data['text'][0] literally contains a square bracket [ followed by a, etc., the count will be zero. Or as specified in the documentation of str.count [Python-doc]:

string.count(s, sub[, start[, end]])

Return the number of (non-overlapping) occurrences of substring sub in string s[start:end]. Defaults for start and end and interpretation of negative values are the same as for slices.)

In case the string is rather large, and you do not want to construct a list of matches, you can count the number of elements with:

sum(1 for _ in re.finditer('[a-zA-Z]', 'mystring'))

It is however typically faster to simply use re.findall(..) and then calculate the number of elements.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
1

In this one:

spam_data['text'][0].count((r'[a-zA-Z]'))

the count accepts parameter by string, not regex, that is why it returns 0.

Use your second example.

BladeMight
  • 2,670
  • 2
  • 21
  • 35