0

I would like to match a text that is surrounded by one of two allowed symbols (let's say & and #). Whichever of the two symbols is used before the text, should follow after the text; the second symbol option is not allowed (eg. &Time& and #Time# are valid but &Time# is not). I would like to try using lookbehind and lookforward for this by capturing the first symbol in a group. But when I try to do this, the lookbehind and lookahead parts are also included in the match. Is it possible to extract just the text using lookbehind and lookahead with backreference?

r"(?<=(&|#))([A-Za-z]+)(?=(\1))" matches all the string &Hawai&#Rome# instead of extracting Hawai and Rome

Polishko
  • 1
  • 1
  • Why not just `([])([A-Za-z]+)\1` and extract group 2? Is this python? – JvdV Mar 23 '23 at 07:46
  • I did that but I was curious if it's possible with lookfoward, lookbehind and backref. Beginner in regex, trying to experiment as much as possible :) (Yes, Python) – Polishko Mar 23 '23 at 07:58

1 Answers1

0

In your current pattern you are using a 3rd, unnecessary, capture group. You could use (?<=[$#])([A-Za-z]+)(?=\1).

However, since findall() would return all capture groups within Python, I think you might as well just scratch the lookarounds and reference the 2nd capture group using a list comprehension like so:

([&#])([A-Za-z]+)\1

See an online demo. In code:

import re
s = '&Hawai&#Rome#'
l = [x[1] for x in re.findall(r'([&#])([A-Za-z]+)\1', s)]
print(l)

Prints:

['Hawai', 'Rome']
JvdV
  • 70,606
  • 8
  • 39
  • 70