1

Searched for this but all answers were beyond my scope since I am still learning. I am trying to find number of occurrences of a word within a string. I came up with the following code but keep getting weird answers.

s = 'bobzbobz'


word = 'bob'



index = 0 
instance = []
while index < len(s):
    instance.append(s.find(word,index))
    index += 1
print len(instance) #instance = [0, 4, 4, 4, 4, -1, -1, -1]???? Why??

This should print 2 but I get 8. And the reason is because I get a lot of repeat values in my list of instances.

halfer
  • 19,824
  • 17
  • 99
  • 186
MJ49
  • 123
  • 2
  • 11

3 Answers3

0

There are much better ways but in your code you need to check for when find returns -1 for a non match and increase index by what find returns when you do get a match, you are always appending regardless and only moving one index so if there is a later match you keep finding the same substring start index:

s = 'bobobzbobz'
word = 'bob'

index = 0
instance = []
while index < len(s) - len(word):
    f = s.find(word, index)
    if f != -1:
        instance.append(f)
        index = f
    index += 1
print (instance)
[0, 2, 6]

You cannot use .count when you want to consider overlapping substrings as in the example above.

To break it down per iteration using s = 'bobzbobz':

s.find(word, 0) -> 0  # s[index:] -> 'bobzbobz' 
s.find(word, 1) -> 4  # s[index:] ->  'obzbobz'
s.find(word, 2) -> 4  # s[index:] ->  'bzbobz'
s.find(word, 3) -> 4  # s[index:] -> 'zbobz'
s.find(word, 4) -> 4 # s[index:] -> 'bobz'
s.find(word, 5) -> -1 # s[index:] -> 'obz'
s.find(word, 6) -> -1 # s[index:] ->  'bz'
s.find(word, 7) -> -1 # s[index:] ->  'z'

You get four 4's in your output list as from index 1 to index 4 find is finding the substring word starting at index 4, after that the index has moved past the last bob i.e index 4 so you get -1 added each time as find does not find a match.

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
0

Each call to find will scan the rest of the string, looking for occurrences that start further down. So you keep matching the same occurrence of bob until you advance past it. To check each position exactly once, use a different method. One possibility is simply using startswith():

n = 0
for i in range(len(s)):
    if s[i:].startswith("bob"):
        n += 1

Or equivalently:

n = 0
for i in range(len(s)):
    if s.startswith("bob", i):
        n += 1

There are all sorts of alternatives to using a loop, but I think this is nice and clear.

alexis
  • 48,685
  • 16
  • 101
  • 161
-1
s = 'bobzbobz'
word = 'bob'
print s.count(word)
Cody Bouche
  • 945
  • 5
  • 10