Python re.sub() is replacing the full match even when using non-capturing groups

Question

I believe that re.sub() is replacing the Full Match, but in this case I only want to replace the matching groups and ignore the non-capturing groups. How can I go about this?

string = 'aBCDeFGH'

print(re.sub('(a)?(?:[A-Z]{3})(e)?(?:[A-Z]{3})', '+', string))

output is :

Expected output is:

+BCD+FGH

Try [`re.sub('[ae]([A-Z]{3})', r'+\1', 'aBCDeFGH')`](http://rextester.com/CUOY83316) — Wiktor Stribiżew, Mar 28 '18 at 07:06
Try `re.sub('(a)?([A-Z]{3})(e)?([A-Z]{3})', r'+\2+\4', string)` — Sohaib Farooqi, Mar 28 '18 at 07:08
That's they way `re.sub` works... if you want to keep portions of the original string you can always put them in the replacement string using groups. — Giacomo Alzetta, Mar 28 '18 at 07:09
Also, an alternative is to use lookaheads: `re.sub(r'[a-z](?=[A-Z]{3})', '+', string)` this will match a single lowercase character, only if it is followed by 3 uppercase ones, and in that case it replaces it with `+`, which is what you want. — Giacomo Alzetta, Mar 28 '18 at 07:12

score 10 · Accepted Answer · answered Mar 28 '18 at 07:08

10

The general solution for such problems is using a lambda in the replacement:

string = 'aBCDeFGH'

print(re.sub('(a)?([A-Z]{3})(e)?([A-Z]{3})', lambda match: '+%s+%s' % (match.group(2), match.group(4)), string))

However, as bro-grammer has commented, you can use backreferences in this case:

print(re.sub('(a)?([A-Z]{3})(e)?([A-Z]{3})', r'+\2+\4', string))

answered Mar 28 '18 at 07:08

pts

80,836
20
110
183

Thanks! This solved my problem. Python documentation never mentions anything about being able to use lambda function in re.sub(): – Darwin Mar 28 '18 at 07:18
@Darwin From [the docs](https://docs.python.org/3/library/re.html#re.sub): _"repl can be a string or a function"_. There's even an example. – Aran-Fey Mar 28 '18 at 07:25
1

For a fuller answer, another solution would be to use non consuming groups (look aheads and look behinds, as giacomo stated – Veltzer Doron Mar 28 '18 at 08:23

Python re.sub() is replacing the full match even when using non-capturing groups

1 Answers1

Linked