47

I need to replace part of a string. I was looking through the Python documentation and found re.sub.

import re
s = '<textarea id="Foo"></textarea>'
output = re.sub(r'<textarea.*>(.*)</textarea>', 'Bar', s)
print output

>>>'Bar'

I was expecting this to print '<textarea id="Foo">Bar</textarea>' and not 'bar'.

Could anybody tell me what I did wrong?

BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
Pickels
  • 33,902
  • 26
  • 118
  • 178
  • 3
    The usual recommendation is that you not use regex for HTML. It is a longstanding response on this site, with some classic responses, culminating in this one. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – hughdbrown Oct 22 '10 at 15:59
  • Yep, was thinking to use regex since it's really small piece but switched to BeautifulSoup instead. – Pickels Oct 22 '10 at 19:27

2 Answers2

80

Instead of capturing the part you want to replace you can capture the parts you want to keep and then refer to them using a reference \1 to include them in the substituted string.

Try this instead:

output = re.sub(r'(<textarea.*>).*(</textarea>)', r'\1Bar\2', s)

Also, assuming this is HTML you should consider using an HTML parser for this task, for example Beautiful Soup.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • I think you mean `r'\1Bar\3'`. – nmichaels Oct 22 '10 at 14:07
  • 3
    As mentioned, best not to parse your own html. But for the sake of completeness, should point out that by default regular expressions are greedy, so in this example, the first capture group would match up to the **last** open angle bracket. If the string had tags inside the `)'` – Jonathan Cross Mar 25 '13 at 21:22
3

Or you could just use the search function instead:

match=re.search(r'(<textarea.*>).*(</textarea>)', s)
output = match.group(1)+'bar'+match.group(2)
print output
>>>'<textarea id="Foo">bar</textarea>'
Rahul Agarwal
  • 905
  • 1
  • 8
  • 13