2

Lets say we have this:

A2 A1 B.         #1

A1 B.            #2

A3 A1 A8 B.      #3

How would I go about if I want:

  1. To match: A2 A1 B. and A1 B.
  2. To match: A1 B.
  3. To match: A3 A1 A8 B. and A1 A8 B. and A8 B.

So far I've got this regex:

A\d\s(.*\.)

But it won't match subsets of code that's already been matched (I'm matching using re.finditer)/ My guess is that re.finditer is doing just as its supposed to, and I'm just trying to force it into doing stupid stuff.

Playground

demongolem
  • 9,474
  • 36
  • 90
  • 105
Olian04
  • 6,480
  • 2
  • 27
  • 54

1 Answers1

2

You can use lookahead for this and capture values inside the lookahead:

regex = r"(?=((?:A\d+\s+)+B\.))"

RegEx Demo

RegEx Description:

(?=               # start lookahead
   (              # start capturing group #1
      (?:         # start non-capturing group
         A\d+\s+  # match A followed by 1 or more digit followed by 1 or more whitespace
      )           # end non-capturing group
      +B\.        # match B and literal DOT
   )              # end capture group #1
)                 # end lookahead

Code:

>>> regex = r"(?=((?:A\d+\s+)+B\.))"

>>> print re.findall(regex, 'A2 A1 B.')
['A2 A1 B.', 'A1 B.']

>>> print re.findall(regex, 'A1 B.')
['A1 B.']

>>> print re.findall(regex, 'A3 A1 A8 B.')
['A3 A1 A8 B.', 'A1 A8 B.', 'A8 B.']
anubhava
  • 761,203
  • 64
  • 569
  • 643