0

I was looking at: Python regex find all overlapping matches? and the re.finditer didn't work for me. I don't want to download yet another module (i.e. regex) to replace the built-in re. I thought I could write my own but my understanding of while loops is limited.

I'm trying to make a search wrapper that finds all patterns even if they are overlapping (one thing the current findall does not do in re).

I've also never tried to program something like this so instead of using a built-in module for this, I would like to try and construct my own function so I can learn how to program dynamically.

def findall_positions(pattern, sequence):
    positions = list()
    cont = True

    while cont == True:
        match = re.search(pattern, sequence)
        if match is not None:
            positions.append(match.start())
            sequence = sequence[positions[-1]:]
        if match is None:
            cont = False
    return positions

findall_positions("AB","ABBABBABBBA")

My logic was to do re.search and if there is a hit, append the start to positions, then get next stretch of the string after that first re.search match and iterate through this until match is None. However, I'm getting an infinite loop. How can I restructure this to get the correct results? I'm looking for an output of [0,3,6]

Community
  • 1
  • 1
O.rka
  • 29,847
  • 68
  • 194
  • 309

1 Answers1

2

The issue with above approach is that the start of first match is at offset 0 and thus on the first iteration positions is [0]. Then sequence = sequence[positions[-1]:] naturally results to original string and thus you have infinite loop.

You could have a separate variable to keep track of the offset and construct new string every time you do re.search. If match is found then adjust the position accordingly:

import re

def findall_positions(pattern, sequence):
    positions = list()
    cont = True
    offset = 0

    while cont == True:
        match = re.search(pattern, sequence[offset:])
        if match is not None:
            positions.append(match.start() + offset)
            offset = positions[-1] + 1
        if match is None:
            cont = False
    return positions

print(findall_positions("AB","ABBABBABBBA"))

Output:

[0, 3, 6]
niemmi
  • 17,113
  • 7
  • 35
  • 42