91

Suppose I want to change the blue dog and blue cat wore blue hats to the gray dog and gray cat wore blue hats.

With sed I could accomplish this as follows:

$ echo 'the blue dog and blue cat wore blue hats' | sed 's/blue \(dog\|cat\)/gray \1/g'

How can I do a similar replacement in Python? I've tried:

>>> import re
>>> s = "the blue dog and blue cat wore blue hats"
>>> p = re.compile(r"blue (dog|cat)")
>>> p.sub('gray \1',s)
'the gray \x01 and gray \x01 wore blue hats'
zekel
  • 9,227
  • 10
  • 65
  • 96
Eric Wilson
  • 57,719
  • 77
  • 200
  • 270

4 Answers4

108

You need to escape your backslash:

p.sub('gray \\1', s)

alternatively you can use a raw string as you already did for the regex:

p.sub(r'gray \1', s)
mac
  • 42,153
  • 26
  • 121
  • 131
39

As I was looking for a similar answer; but wanting using named groups within the replace, I thought I'd add the code for others:

p = re.compile(r'blue (?P<animal>dog|cat)')
p.sub(r'gray \g<animal>',s)
justcompile
  • 3,362
  • 1
  • 29
  • 37
24

Off topic, For numbered capture groups:

#/usr/bin/env python
import re

re.sub(
    pattern=r'(\d)(\w+)', 
    repl='word: \\2, digit: \\1', 
    string='1asdf'
)

word: asdf, digit: 1

Python uses literal backslash, plus one-based-index to do numbered capture group replacements, as shown in this example. So \1, entered as '\\1', references the first capture group (\d), and \2 the second captured group.

ThorSummoner
  • 16,657
  • 15
  • 135
  • 147
  • Off topic, but I was wondering if it is possible to replace the captured group, in your case `group1` is `1` can we replace `group1` to lets say `5` so the final output can be something like `5asdf`. (i.e., replacing the entire group) – ooo Jul 22 '20 at 05:12
  • @anoop If I understand your goal it sounds like you don't want to capture the `1` at all, in that case simply don't capture it (by not enclosing it in parenthesis). If you want to extract strings with regex, use `re.match` or `re.search` (and variants), that will give you for example a group dict (https://docs.python.org/3/library/re.html#re.Match.groupdict) and you can format/parse data from there as you like – ThorSummoner Jul 22 '20 at 18:35
  • @anoop oh, you can also simply not use the capture group (or not capture it at all) and hard code more date into your output string, must like the words "word: ' and ', digit: ' are in the example. – ThorSummoner Jul 22 '20 at 18:42
  • OK let me explain it with an example I had text similar to `function public xyzname()` and I wanted to change `public` to `private` so only way I can do it by grouping `function` and `xyzname()` and applying something like `\\1 private \\2`, but I was wondering if I can group just `public` as `group 1` and replace it with `private`, is it possible? – ooo Jul 23 '20 at 16:12
  • @anoop sure that kind of transformation is pretty common in regex use cases, you're on the right track with your question, play with it to see how it work when you apply `\\1 private \\2` and adjust it as necessary – ThorSummoner Jul 24 '20 at 06:13
  • I don't understand now how to replace '123word' by '123_word` should I do something like this? how to get group() function `re.sub(r'(\d)(\d)',group(0)_group(1),'123word')` – Ali Husham Jul 22 '21 at 08:49
  • @alial-karaawi the `.group()` method is used in python code, but the regex "substitute" function happens inside the regex engine. If we use python's `.group()`, we are forced to round-trip from regex to python to regex, which is okay, but may perform differently. In researching if that can be done i found `re.sub` does not use capture groups during replacements, so the input expression '(\d)(\d)' replaces '12' with the _repl_ not '1' and '2' with independent _repl_. I recommend playing with perl expressions on test data to learn more about how a good regex engine works, without python – ThorSummoner Jul 23 '21 at 02:36
8

Try this:

p.sub('gray \g<1>',s)
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
CAB
  • 1,015
  • 1
  • 14
  • 24
  • 4
    Nice alternative (+1) but it still works only because `\g` is not a valid escaped code. The safe way of writing your code should still be: `p.sub('gray \\g<1>',s)` – mac Jul 15 '11 at 18:46
  • Sorry, I meant that to be a raw string. I left out the replacement argument, too--I was on a roll! I'm deleting the comment. I agree 100% about not counting on Python's too-permissive behavior with respect to escape sequences. – Alan Moore Jul 16 '11 at 05:48
  • 2
    @mac Consider adding your comment here to your answer. It is the only thing that worked reliably in ipython notebook. – scharfmn Feb 25 '15 at 09:04
  • @mac: `\g` was chosen especifically *not to* clash with other escape codes. It would've been a poor choice by Python devs if it did. https://docs.python.org/2/library/re.html#re.sub – MestreLion Mar 08 '16 at 05:34
  • 1
    You can write `p.sub(r'gray \g<1>',s)` to prevent the `\` being parsed by Python, allowing it to be sent directly to the regex engine. – shadowtalker Aug 13 '18 at 07:02