86

Trying to get to grips with regular expressions in Python, I'm trying to output some HTML highlighted in part of a URL. My input is

images/:id/size

my output should be

images/<span>:id</span>/size

If I do this in Javascript

method = 'images/:id/size';
method = method.replace(/\:([a-z]+)/, '<span>$1</span>')
alert(method)

I get the desired result, but if I do this in Python

>>> method = 'images/:id/huge'
>>> re.sub('\:([a-z]+)', '<span>$1</span>', method)
'images/<span>$1</span>/huge'

I don't, how do I get Python to return the correct result rather than $1? Is re.sub even the right function to do this?

martineau
  • 119,623
  • 25
  • 170
  • 301
Blank
  • 4,635
  • 5
  • 33
  • 53

4 Answers4

135

Simply use \1 instead of $1:

In [1]: import re

In [2]: method = 'images/:id/huge'

In [3]: re.sub(r'(:[a-z]+)', r'<span>\1</span>', method)
Out[3]: 'images/<span>:id</span>/huge'

Also note the use of raw strings (r'...') for regular expressions. It is not mandatory but removes the need to escape backslashes, arguably making the code slightly more readable.

kubanczyk
  • 5,184
  • 1
  • 41
  • 52
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 13
    For those looking for this example and wondering why it fails on your tests, make sure to add the r (character 'r') before the group string – Marcello Grechi Lins Jul 10 '15 at 17:00
  • 4
    The `r` specifier was the issue this answer helped me with as well. – kungphu Jan 29 '16 at 10:46
  • 3
    `\g<0>` works when there is no matching group, i.e. for a non-grouping regex like `':[a-z]+'`. Straight from https://docs.python.org/3/library/re.html#re.sub – ccpizza Nov 19 '17 at 15:46
  • is there a way to modify what's in \1 before the substitution? – gary69 Feb 09 '19 at 16:23
18

A backreference to the whole match value is \g<0>, see re.sub documentation:

The backreference \g<0> substitutes in the entire substring matched by the RE.

See the Python demo:

import re
method = 'images/:id/huge'
print(re.sub(r':[a-z]+', r'<span>\g<0></span>', method))
# => images/<span>:id</span>/huge

If you need to perform a case insensitive search, add flag=re.I:

re.sub(r':[a-z]+', r'<span>\g<0></span>', method, flags=re.I)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
18

Use \1 instead of $1.

\number Matches the contents of the group of the same number.

http://docs.python.org/library/re.html#regular-expression-syntax

6

For the replacement portion, Python uses \1 the way sed and vi do, not $1 the way Perl, Java, and Javascript (amongst others) do. Furthermore, because \1 interpolates in regular strings as the character U+0001, you need to use a raw string or \escape it.

Python 3.2 (r32:88445, Jul 27 2011, 13:41:33) 
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> method = 'images/:id/huge'
>>> import re
>>> re.sub(':([a-z]+)', r'<span>\1</span>', method)
'images/<span>id</span>/huge'
>>> 
tchrist
  • 78,834
  • 30
  • 123
  • 180