Finding words in a string

Question

Searched for this but all answers were beyond my scope since I am still learning. I am trying to find number of occurrences of a word within a string. I came up with the following code but keep getting weird answers.

s = 'bobzbobz'


word = 'bob'



index = 0 
instance = []
while index < len(s):
    instance.append(s.find(word,index))
    index += 1
print len(instance) #instance = [0, 4, 4, 4, 4, -1, -1, -1]???? Why??

This should print 2 but I get 8. And the reason is because I get a lot of repeat values in my list of instances.

If that's all you want, `s.count(word)` will do it for you :-) — alexis, Sep 09 '15 at 21:02
You get -1's added because when the substring is not found find returns -1, you keep getting 4's because you keep finding the last bob as you only move the index one place — Padraic Cunningham, Sep 09 '15 at 21:04
To see what's going wrong with your code, consider what happens when `index` is 1... — alexis, Sep 09 '15 at 21:04
possible duplicate of [Count occurrence of a character in a string](http://stackoverflow.com/questions/1155617/count-occurrence-of-a-character-in-a-string) — taesu, Sep 09 '15 at 21:05

Padraic Cunningham · Answer 1 · 2015-09-09T21:29:54.110

There are much better ways but in your code you need to check for when find returns -1 for a non match and increase index by what find returns when you do get a match, you are always appending regardless and only moving one index so if there is a later match you keep finding the same substring start index:

s = 'bobobzbobz'
word = 'bob'

index = 0
instance = []
while index < len(s) - len(word):
    f = s.find(word, index)
    if f != -1:
        instance.append(f)
        index = f
    index += 1
print (instance)
[0, 2, 6]

You cannot use .count when you want to consider overlapping substrings as in the example above.

To break it down per iteration using s = 'bobzbobz':

s.find(word, 0) -> 0  # s[index:] -> 'bobzbobz' 
s.find(word, 1) -> 4  # s[index:] ->  'obzbobz'
s.find(word, 2) -> 4  # s[index:] ->  'bzbobz'
s.find(word, 3) -> 4  # s[index:] -> 'zbobz'
s.find(word, 4) -> 4 # s[index:] -> 'bobz'
s.find(word, 5) -> -1 # s[index:] -> 'obz'
s.find(word, 6) -> -1 # s[index:] ->  'bz'
s.find(word, 7) -> -1 # s[index:] ->  'z'

You get four 4's in your output list as from index 1 to index 4 find is finding the substring word starting at index 4, after that the index has moved past the last bob i.e index 4 so you get -1 added each time as find does not find a match.

alexis · Answer 2 · 2015-09-10T07:26:47.027

Each call to find will scan the rest of the string, looking for occurrences that start further down. So you keep matching the same occurrence of bob until you advance past it. To check each position exactly once, use a different method. One possibility is simply using startswith():

n = 0
for i in range(len(s)):
    if s[i:].startswith("bob"):
        n += 1

Or equivalently:

n = 0
for i in range(len(s)):
    if s.startswith("bob", i):
        n += 1

There are all sorts of alternatives to using a loop, but I think this is nice and clear.

score -1 · Answer 3 · answered Sep 09 '15 at 21:09

-1

s = 'bobzbobz'
word = 'bob'
print s.count(word)

answered Sep 09 '15 at 21:09

Cody Bouche

945
5
10

Finding words in a string

3 Answers3