Using a backreference to a group captured in positive lookbehind and inserting the group in a positive lookforward in regex?

Question

I would like to match a text that is surrounded by one of two allowed symbols (let's say & and #). Whichever of the two symbols is used before the text, should follow after the text; the second symbol option is not allowed (eg. &Time& and #Time# are valid but &Time# is not). I would like to try using lookbehind and lookforward for this by capturing the first symbol in a group. But when I try to do this, the lookbehind and lookahead parts are also included in the match. Is it possible to extract just the text using lookbehind and lookahead with backreference?

r"(?<=(&|#))([A-Za-z]+)(?=(\1))" matches all the string &Hawai&#Rome# instead of extracting Hawai and Rome

Why not just `([])([A-Za-z]+)\1` and extract group 2? Is this python? — JvdV, Mar 23 '23 at 07:46
I did that but I was curious if it's possible with lookfoward, lookbehind and backref. Beginner in regex, trying to experiment as much as possible :) (Yes, Python) — Polishko, Mar 23 '23 at 07:58

score 0 · Answer 1 · answered Mar 23 '23 at 08:00

In your current pattern you are using a 3rd, unnecessary, capture group. You could use (?<=[$#])([A-Za-z]+)(?=\1).

However, since findall() would return all capture groups within Python, I think you might as well just scratch the lookarounds and reference the 2nd capture group using a list comprehension like so:

([&#])([A-Za-z]+)\1

See an online demo. In code:

import re
s = '&Hawai&#Rome#'
l = [x[1] for x in re.findall(r'([&#])([A-Za-z]+)\1', s)]
print(l)

Prints:

['Hawai', 'Rome']

Using a backreference to a group captured in positive lookbehind and inserting the group in a positive lookforward in regex?

1 Answers1