Python: Replace with regex

Question

I need to replace part of a string. I was looking through the Python documentation and found re.sub.

import re
s = '<textarea id="Foo"></textarea>'
output = re.sub(r'<textarea.*>(.*)</textarea>', 'Bar', s)
print output

>>>'Bar'

I was expecting this to print '<textarea id="Foo">Bar</textarea>' and not 'bar'.

Could anybody tell me what I did wrong?

The usual recommendation is that you not use regex for HTML. It is a longstanding response on this site, with some classic responses, culminating in this one. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — hughdbrown, Oct 22 '10 at 15:59
Yep, was thinking to use regex since it's really small piece but switched to BeautifulSoup instead. — Pickels, Oct 22 '10 at 19:27

score 80 · Accepted Answer · answered Oct 22 '10 at 14:04

80

Instead of capturing the part you want to replace you can capture the parts you want to keep and then refer to them using a reference \1 to include them in the substituted string.

Try this instead:

output = re.sub(r'(<textarea.*>).*(</textarea>)', r'\1Bar\2', s)

Also, assuming this is HTML you should consider using an HTML parser for this task, for example Beautiful Soup.

answered Oct 22 '10 at 14:04

Mark Byers

811,555
193
1,581
1,452

I think you mean `r'\1Bar\3'`. – nmichaels Oct 22 '10 at 14:07
3

As mentioned, best not to parse your own html. But for the sake of completeness, should point out that by default regular expressions are greedy, so in this example, the first capture group would match up to the **last** open angle bracket. If the string had tags inside the ``, those would be included inside the match. It would be better to use the question mark to prevent this: `r'(<textarea.>).*(</textarea.>)'` – Jonathan Cross Mar 25 '13 at 21:22

score 3 · Answer 2 · answered Dec 09 '14 at 17:28

3

Or you could just use the search function instead:

match=re.search(r'(<textarea.*>).*(</textarea>)', s)
output = match.group(1)+'bar'+match.group(2)
print output
>>>'<textarea id="Foo">bar</textarea>'

answered Dec 09 '14 at 17:28

Rahul Agarwal

905
1
8
13

Python: Replace with regex

2 Answers2

Linked

Related