Finding occurrences of a word in a string in python 3

Question

I'm trying to find the number of occurrences of a word in a string.

word = "dog"
str1 = "the dogs barked"

I used the following to count the occurrences:

count = str1.count(word)

The issue is I want an exact match. So the count for this sentence would be 0. Is that possible?

Amber · Accepted Answer · 2013-06-24T06:35:47.850

50

If you're going for efficiency:

import re
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), input_string))

This doesn't need to create any intermediate lists (unlike split()) and thus will work efficiently for large input_string values.

It also has the benefit of working correctly with punctuation - it will properly return 1 as the count for the phrase "Mike saw a dog." (whereas an argumentless split() would not). It uses the \b regex flag, which matches on word boundaries (transitions between \w a.k.a [a-zA-Z0-9_] and anything else).

If you need to worry about languages beyond the ASCII character set, you may need to adjust the regex to properly match non-word characters in those languages, but for many applications this would be an overcomplication, and in many other cases setting the unicode and/or locale flags for the regex would suffice.

edited Jun 24 '13 at 06:35

answered Jun 24 '13 at 06:09

Amber

507,862
82
626
550

worked like a charm! Not sure why there's a downvote. Could you explain what exactly's going on or where I could look for this? I've never seen a for loop with an underscore. Thanks! – lost9123193 Jun 24 '13 at 06:26
@lost9123193 `_` is often used as a placeholder in for loops :). I'm sure Amber could explain it better :p – TerryA Jun 24 '13 at 06:27
2

@lost9123193 - A `_` is simply a dummy variable, a way of saying "I don't actually care about the value here." In this case, I'm using it because we're always summing up `1`s for the count; we don't actually care about the match objects returned from `re.finditer()`. – Amber Jun 24 '13 at 06:29
Also if you're wondering what the `re` bit is - http://docs.python.org/2/library/re.html – Amber Jun 24 '13 at 06:30
For information about the difference between `re.finditer` and `re.findall`, check out this link: https://medium.com/geoblinktech/so-a-few-months-ago-i-had-to-search-the-quickest-way-to-apply-a-regular-expression-to-a-huge-c0883f8d4e4f#a859 – Leland Hepworth Jun 10 '21 at 18:57

grc · Answer 2 · 2013-06-24T06:23:57.297

17

You can use str.split() to convert the sentence to a list of words:

a = 'the dogs barked'.split()

This will create the list:

['the', 'dogs', 'barked']

You can then count the number of exact occurrences using list.count():

a.count('dog')  # 0
a.count('dogs') # 1

If it needs to work with punctuation, you can use regular expressions. For example:

import re
a = re.split(r'\W', 'the dogs barked.')
a.count('dogs') # 1

edited Jun 24 '13 at 06:23

answered Jun 24 '13 at 06:10

grc

22,885
5
42
63

This is probably the simplest method, but do note that it will fail for strings that include punctuation next to the counted word. – Amber Jun 24 '13 at 06:13
A "\W" regexp will fail for any foreign words such as café, which is a drawback. – Lennart Regebro Jun 24 '13 at 06:23
Oh, hey, the Unicode flag is default in Python 3. So yes. But I found another potential issue, "I'm" will be two words, "I" and "m". – Lennart Regebro Jun 24 '13 at 06:37
@LennartRegebro and there's also an issue with hyphenated words. – grc Jun 24 '13 at 06:45
@grc, If you want to count them as one, word, yes. That's a matte of taste, I guess. :-) – Lennart Regebro Jun 24 '13 at 06:47

TerryA · Answer 3 · 2013-06-24T07:14:49.680

5

Use a list comprehension:

>>> word = "dog"
>>> str1 = "the dogs barked"
>>> sum(i == word for word in str1.split())
0

>>> word = 'dog'
>>> str1 = 'the dog barked'
>>> sum(i == word for word in str1.split())
1

split() returns a list of all the words in a sentence. Then we use a list comprehension to count how many times the word appears in a sentence.

edited Jun 24 '13 at 07:14

answered Jun 24 '13 at 06:09

TerryA

58,805
11
114
143

1

To whomever downvoted this: if you're going to downvote, it's usually a good idea to at least leave a comment explaining why. – Amber Jun 24 '13 at 06:11
@LennartRegebro Does not mean you should downvote the answer. The answer is correct – TerryA Jun 24 '13 at 06:18
@LennartRegebro That's not a useful statement. People who post answers on StackOverflow often want to learn just as much as people who post questions do; useful and actionable feedback is an important part of that. – Amber Jun 24 '13 at 06:21
"It's not a good answer" please tell me how I could improve :) – TerryA Jun 24 '13 at 06:23
@LennartRegebro I am calm; you seem to think that I'm worked up because I'm disapproving of the manner in which you've been responding, but that's not the case. I simply would like to see more constructive interaction. My original comment simply asked for such constructive commenting; you chose to interpret that as impatient when it was nothing of the sort. Either way, this is the last I'll comment in this particular area; I have no desire to draw this out. Feel free to get the last word if you would like. – Amber Jun 24 '13 at 06:25
Both of you calm down. Now, could you please explain why you downvoted (in regards to how "It's not a good answer") – TerryA Jun 24 '13 at 06:28
2

Your `sum()` implementation is just an inefficient reimplementation of the `count()` method that already exists on lists. Use `.count(word)` instead. – Lennart Regebro Jun 24 '13 at 06:28
@Haidro: This answer is not correct, for a useful definition of correct. This is not a maths tests where you get points for having the right number in the end. – Lennart Regebro Jun 24 '13 at 06:30
But I do apologize for not noticing earlier that the impatient one and the one who posted the answer was different people. If I had realized this, I would have given my explanation immediately. Sorry. – Lennart Regebro Jun 24 '13 at 06:31
1

@Haidro: As a final statement on this: You might want to hover your mouse over the up and down arrows and notice what they say. But otherwise, by all means, go on correcting people who has been members for ten times as long as you on how Stackoverflow works. :-) – Lennart Regebro Jun 24 '13 at 06:45
I like this but you should actually just simplify it to `sum(i == word for word in str1.split())`. [That would be the most pythonic way to do it](http://stackoverflow.com/questions/3174392/is-it-pythonic-to-use-bools-as-ints) – jamylak Jun 24 '13 at 07:13
@jamylak: That relies on int(True) being 1, which may be sorter, but harder to understand than the original. And is still slower than simply calling `.count()`. – Lennart Regebro Jun 24 '13 at 11:20
@LennartRegebro `.count` is better, I agree, "That relies on int(True) being 1" Did you even read the huge highlighted link or not? – jamylak Jun 25 '13 at 11:10
@jamylak: Yes. So? It still means you have to know this and consider it when reading the code. It makes it harder to understand than the original. Claiming it's the most pythonic way to do it is patent nonsense. – Lennart Regebro Jun 25 '13 at 15:33

score 4 · Answer 4 · answered Jun 24 '13 at 09:58

4

import re

word = "dog"
str = "the dogs barked"
print len(re.findall(word, str))

answered Jun 24 '13 at 09:58

Aaron

41
1

1

Only Problem with this is dogs and dog are two different word , your soultion is giving 1 as output , ideally it should give 0. – Alok Prasad Nov 24 '20 at 09:18

Lennart Regebro · Answer 5 · 2013-06-24T06:48:08.880

3

You need to split the sentence into words. For you example you can do that with just

words = str1.split()

But for real word usage you need something more advanced that also handles punctuation. For most western languages you can get away with replacing all punctuation with spaces before doing str1.split().

This will work for English as well in simple cases, but note that "I'm" will be split into two words: "I" and "m", and it should in fact be split into "I" and "am". But this may be overkill for this application.

For other cases such as Asian language, or actual real world usage of English, you might want to use a library that does the word splitting for you.

Then you have a list of words, and you can do

count = words.count(word)

edited Jun 24 '13 at 06:48

answered Jun 24 '13 at 06:12

Lennart Regebro

167,292
41
224
251

Haha, now this got downvoted for no reason. I suspect childishness. ;-) But I already have over 20k, so I don't mind, downvote on. – Lennart Regebro Jun 24 '13 at 06:51
OK, I'm glad to hear that. – Lennart Regebro Jun 24 '13 at 06:56

score 1 · Answer 6 · answered Aug 02 '18 at 10:37

    #counting the number of words in the text
def count_word(text,word):
    """
    Function that takes the text and split it into word
    and counts the number of occurence of that word
    input: text and word
    output: number of times the word appears
    """
    answer = text.split(" ")
    count = 0
    for occurence in answer:
        if word == occurence:
            count = count + 1
    return count

sentence = "To be a programmer you need to have a sharp thinking brain"
word_count = "a"
print(sentence.split(" "))
print(count_word(sentence,word_count))

#output
>>> %Run test.py
['To', 'be', 'a', 'programmer', 'you', 'need', 'to', 'have', 'a', 'sharp', 'thinking', 'brain']
2
>>>

Create the function that takes two inputs which are sentence of text and word. Split the text of a sentence into the segment of words in a list, Then check whether the word to be counted exist in the segmented words and count the occurrence as a return of the function.

score 1 · Answer 7 · answered May 18 '19 at 19:21

If you don't need RegularExpression then you can do this neat trick.

word = " is " #Add space at trailing and leading sides.
input_string = "This is some random text and this is str which is mutable"
print("Word count : ",input_string.count(word))
Output -- Word count :  3

score 0 · Answer 8 · edited Nov 04 '15 at 23:41

0

Below is a simple example where we can replace the desired word with the new word and also for desired number of occurrences:

import string

def censor(text, word):<br>
    newString = text.replace(word,"+" * len(word),text.count(word))
    print newString

print censor("hey hey hey","hey")

output will be : +++ +++ +++

The first Parameter in function is search_string. Second one is new_string which is going to replace your search_string. Third and last is number of occurrences .

edited Nov 04 '15 at 23:41

Kara

6,115
16
50
57

answered Aug 05 '15 at 06:34

abhay goyan

11
4

what's
for ? – RetroCode Nov 11 '16 at 19:57

score 0 · Answer 9 · edited Mar 29 '17 at 01:34

Let us consider the example s = "suvotisuvojitsuvo". If you want to count no of distinct count "suvo" and "suvojit" then you use the count() method... count distinct i.e) you don't count the suvojit to suvo.. only count the lonely "suvo".

suvocount = s.count("suvo") // #output: 3
suvojitcount = s.count("suvojit") //# output : 1

Then find the lonely suvo count you have to negate from the suvojit count.

lonelysuvo = suvocount - suvojicount //# output: 3-1 -> 2

score 0 · Answer 10 · answered Jul 15 '17 at 19:51

0

This would be my solution with help of the comments:

word = str(input("type the french word chiens in english:"))
str1 = "dogs"
times = int(str1.count(word))
if times >= 1:
    print ("dogs is correct")
else:
    print ("your wrong")

answered Jul 15 '17 at 19:51

roger

23
1
5

wgetDJ · Answer 11 · 2019-11-27T11:57:54.123

If you want to find the exact number of occurrence of the specific word in the sting and you don't want to use any count function, then you can use the following method.

text = input("Please enter the statement you want to check: ")
word = input("Please enter the word you want to check in the statement: ")

# n is the starting point to find the word, and it's 0 cause you want to start from the very beginning of the string.
n = 0

# position_word is the starting Index of the word in the string
position_word = 0
num_occurrence = 0

if word.upper() in text.upper():
    while position_word != -1:
        position_word = text.upper().find(word.upper(), n, len(text))

        # increasing the value of the stating point for search to find the next word
        n = (position_word + 1)

        # statement.find("word", start, end) returns -1 if the word is not present in the given statement. 
        if position_word != -1:
            num_occurrence += 1

    print (f"{word.title()} is present {num_occurrence} times in the provided statement.")

else:
    print (f"{word.title()} is not present in the provided statement.")

score 0 · Answer 12 · answered Sep 10 '20 at 10:34

This is simple python program using split function

str = 'apple mango apple orange orange apple guava orange'
print("\n My string ==> "+ str +"\n")
str = str.split()
str2=[]

for i in str:
     if i not in str2:
         str2.append(i)
         print( i,str.count(i))

score 0 · Answer 13 · edited Jun 24 '21 at 16:56

0

I have just started out to learn coding in general and I do not know any libraries as such.

s = "the dogs barked"
value = 0
x = 0
y=3
for alphabet in s:
    if (s[x:y]) == "dog":
        value = value+1
    x+=1
    y+=1
print ("number of dog in the sentence is : ", value)

edited Jun 24 '21 at 16:56

Dharman

30,962
25
85
135

answered Jun 24 '21 at 16:51

Bharath R

1
1

score 0 · Answer 14 · answered Dec 05 '21 at 08:07

Another way to do this is by tokenizing string (breaking into words)

Use Counter from collection module of Python Standard Library

from collections import Counter 

str1 = "the dogs barked"
stringTokenDict = { key : value for key, value in Counter(str1.split()).items() } 

print(stringTokenDict['dogs']) 
#This dictionary contains all words & their respective count

Finding occurrences of a word in a string in python 3

14 Answers14

Linked

Related