re.sub replace with matched content

Question

Trying to get to grips with regular expressions in Python, I'm trying to output some HTML highlighted in part of a URL. My input is

images/:id/size

my output should be

images/<span>:id</span>/size

If I do this in Javascript

method = 'images/:id/size';
method = method.replace(/\:([a-z]+)/, '<span>$1</span>')
alert(method)

I get the desired result, but if I do this in Python

>>> method = 'images/:id/huge'
>>> re.sub('\:([a-z]+)', '<span>$1</span>', method)
'images/<span>$1</span>/huge'

I don't, how do I get Python to return the correct result rather than $1? Is re.sub even the right function to do this?

score 135 · Accepted Answer · edited May 24 '19 at 10:10

135

Simply use \1 instead of $1:

In [1]: import re

In [2]: method = 'images/:id/huge'

In [3]: re.sub(r'(:[a-z]+)', r'<span>\1</span>', method)
Out[3]: 'images/<span>:id</span>/huge'

Also note the use of raw strings (r'...') for regular expressions. It is not mandatory but removes the need to escape backslashes, arguably making the code slightly more readable.

edited May 24 '19 at 10:10

kubanczyk

5,184
1
41
52

answered Aug 25 '11 at 13:32

NPE

486,780
108
951
1,012

13

For those looking for this example and wondering why it fails on your tests, make sure to add the r (character 'r') before the group string – Marcello Grechi Lins Jul 10 '15 at 17:00
4

The `r` specifier was the issue this answer helped me with as well. – kungphu Jan 29 '16 at 10:46
3

`\g<0>` works when there is no matching group, i.e. for a non-grouping regex like `':[a-z]+'`. Straight from https://docs.python.org/3/library/re.html#re.sub – ccpizza Nov 19 '17 at 15:46
is there a way to modify what's in \1 before the substitution? – gary69 Feb 09 '19 at 16:23

Wiktor Stribiżew · Answer 2 · 2022-01-28T15:29:12.367

18

A backreference to the whole match value is \g<0>, see re.sub documentation:

The backreference \g<0> substitutes in the entire substring matched by the RE.

See the Python demo:

import re
method = 'images/:id/huge'
print(re.sub(r':[a-z]+', r'<span>\g<0></span>', method))
# => images/<span>:id</span>/huge

If you need to perform a case insensitive search, add flag=re.I:

re.sub(r':[a-z]+', r'<span>\g<0></span>', method, flags=re.I)

edited Jan 28 '22 at 15:29

answered Jan 17 '19 at 11:47

Wiktor Stribiżew

607,720
39
448
563

1

`\g<1>` etc are also valid, providing a way to replace with `\11` (\1 and the number 1) as opposed to capture group 11. – Orwellophile Jun 26 '19 at 16:02
@Orwellophile Yes, this syntax allows to use all the backreferences, not just to Group 0. – Wiktor Stribiżew Jun 26 '19 at 18:13

score 18 · Answer 3 · answered Aug 25 '11 at 13:31

18

Use \1 instead of $1.

\number Matches the contents of the group of the same number.

http://docs.python.org/library/re.html#regular-expression-syntax

answered Aug 25 '11 at 13:31

score 6 · Answer 4 · answered Aug 25 '11 at 13:35

For the replacement portion, Python uses \1 the way sed and vi do, not $1 the way Perl, Java, and Javascript (amongst others) do. Furthermore, because \1 interpolates in regular strings as the character U+0001, you need to use a raw string or \escape it.

Python 3.2 (r32:88445, Jul 27 2011, 13:41:33) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> method = 'images/:id/huge'
>>> import re
>>> re.sub(':([a-z]+)', r'<span>\1</span>', method)
'images/<span>id</span>/huge'
>>>

re.sub replace with matched content

4 Answers4

Linked

Related