1

Possible Duplicate:
Find all occurrences of a substring in Python

I have a string of numbers and am trying to find each time a certain string of numbers occurs in the string.

I know if I use, for example: numString.find(str) that it will tell me the first time it occurs. Is there anyway to modify this statement to find each time that str occurs, not just the first?

Community
  • 1
  • 1
user1294377
  • 1,051
  • 1
  • 10
  • 13
  • Alright, thanks. I haven't yet learned about regular expressions so I will have to write some code to get around it. – user1294377 Jul 11 '12 at 19:22

3 Answers3

1

you can use recursion:

find()uses a second optional argument, which provides the starting index for search, so with every iteration you can set that argument to current value returned by find()+1

>>> strs='aabbaabbaabbaabbaa'
>>> def ret(x,a,lis=None,start=0):
    if lis is None:
        lis=[]
    if x.find(a,start)!=-1:
         index=x.find(a,start)
        lis.append(index)
        return ret(x,a,lis=lis,start=index+1)
    else: return lis

>>> ret(strs,'aa')
[0, 4, 8, 12, 16]

>>> ret(strs,'bb')
[2, 6, 10, 14]
>>> 
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • 1
    Python has a maximum recursion depth of 1000 by default, so this will fail with a `RuntimeError` if there are 1000 or more matches. Also, function calling is expensive, and it's trivial to rewrite this as a `while` loop, thus making it more efficient. – Lauritz V. Thaulow Jul 11 '12 at 19:29
1

Well, is regexp is out of the question, consider this generator code:

def find_all(target, substring):
    current_pos = target.find(substring)
    while current_pos != -1:
        yield current_pos
        current_pos += len(substring)
        current_pos = target.find(substring, current_pos)

We use 'find' optional argument of setting the start index of the search, and every time use the last one found, plus the length of the sub-string (so we do't get the same result every time). If you want to get overlapping matches, use + 1 and not len(substring).

You can 'list(find_all('abbccbb', 'bb'))' to get an actual list of indexs.

Just a side note: generators (aka, the yield keyword) are more memory efficient than plain lists, and while loops have far less overhead than recursion (and are also much easier to read if you are a human being).

Ohad
  • 2,752
  • 17
  • 15
  • This is the solution I was about to write, with the exception of defaulting to simply incrementing the current_pos, in order to handle matches that overlap, as you mention. The performance penalty is not large :-) – Cory Dolphin Jul 11 '12 at 23:06
0

Not the most efficient way of doing it .. but it's a one-liner!! If that counts .... :)

>>> s = "akjdsfaklafdjfjad"
>>> [n for n in set([s.find('a',x) for x in range(len(s))]) if n >= 0]
[0, 9, 6, 15]
Maria Zverina
  • 10,863
  • 3
  • 44
  • 61