9

Is it possible to get all overlapping matches, which starts from the same index, but are from different matching group?

e.g. when I look for pattern "(A)|(AB)" from "ABC" regex should return following matches:

(0,"A") and (0,"AB")

Mikael Lepistö
  • 18,909
  • 3
  • 68
  • 70
  • Actually this is still open, I wasn't very clear what kind of matches I was looking for. Another example would be for "AABAABA" I would like to have matches (0,['A']), (1,['A','AB']), (3, ['A']), (4, ['A','AB']), (6, ['A']). – Mikael Lepistö May 23 '11 at 22:22
  • This Q&A is doubly useful for the strange fact it answers two questions, of which one is misunderstood but nevertheless still useful. – n611x007 Apr 07 '13 at 12:55

2 Answers2

6

For one possibility see the answer of Evpok. The second interpretation of your question can be that you want to match all patterns at the same time from the same position. You can use a lookahead expression in this case. E.g. the regular expression

(?=(A))(?=(AB))

will give you the desired result (i.e. all places where both patterns match together with the groups).

Update: With the additional clarification this can still be done with a single regex. You just have to make both groups above optional, i.e.

(?=(A))?(?=(AB))?(?:(?:A)|(?:AB))

Nevertheless I wouldn't suggest to do so. You can much more easily look for each pattern separately and later join the results.

string = "AABAABA"
result = [(g.start(), g.group()) for g in re.compile('A').finditer(string)]
result += [(g.start(), g.group()) for g in re.compile('AB').finditer(string)]
Howard
  • 38,639
  • 9
  • 64
  • 83
  • Nice! This seems to do exactly what I wanted. – Mikael Lepistö May 23 '11 at 21:24
  • I added clarifying comment to question. This solution indeed gives only matches where both of the patterns matches. For case in comment above they would be (1, ['A','AB']) and (4, ['A','AB']). – Mikael Lepistö May 23 '11 at 22:26
  • By the way, is `re.compile('AB').finditer(string)` really useful? `re.finditer('AD',string)` should do the trick, or if you absolutely want to compile it beforehand, doing it outside the list comprehension should work too and clarify the expression. – Evpok Jun 10 '11 at 12:22
3

I get this though I can't recall where or from who

def myfindall(regex, seq):
    resultlist = []
    pos = 0
    while True:
        result = regex.search(seq, pos)
        if result is None:
            break
        resultlist.append(seq[result.start():result.end()])
        pos = result.start() + 1
    return resultlist

it returns a list of all (even overlapping) matches, with the limit of no more than one match for each index.

Evpok
  • 4,273
  • 3
  • 34
  • 46
  • In case above this seems to return just 'A' but not match 'AB'. – Mikael Lepistö May 23 '11 at 21:34
  • True, matches with no exclusive part break this. btw does http://stackoverflow.com/questions/5616822/python-regex-find-all-overlapping-matches/5616910#5616910 work? – Evpok May 23 '11 at 23:41
  • Not really. I tried that earlier, and it does not either work as I wanted. It does only one match for each index of matched string. I ended up writing simple lookup table based parser, since this was quite nasty case to do with regex. – Mikael Lepistö May 24 '11 at 07:44
  • I didn't realize python has a separate `search` function and that `match` only matches at the start of the string: http://docs.python.org/2/library/re.html#search-vs-match – n611x007 Apr 07 '13 at 12:54