Regex replace before and after text, keep text in place

Question

I have some text like

<br />
blah
<br />
blah blah

Which im trying to change to:

<p>
blah
</p>
<p>
blah blah
</p>

I have the following regex

newContent = re.sub("<br />(?=(.*(<br />)?\n)<br />)","<p>",newContent)

But this isn't going to work how I want. I want anything before the look forward to be replaced with <p> and after the look forward to be replaced with </p>

Is this possible?

Word of warning: [Regex is not a tool that can be used to correctly parse HTML](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — jbabey, Oct 25 '13 at 13:48
Another day. Another chance to post this explanation: http://stackoverflow.com/questions/6751105/why-its-not-possible-to-use-regex-to-parse-html-xml-a-formal-explanation-in-la — Cfreak, Oct 25 '13 at 13:50
This question appears to be off-topic because it is [about parsing HTML with RegEx](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags)! — , Oct 25 '13 at 13:59
Act like if I didn't give you a [regex solution](http://regex101.com/r/sV1mL5) :P `
(.*?)(?=
|$)`, replace with `
\1
` — HamZa, Oct 25 '13 at 14:06

score 3 · Answer 1 · answered Oct 25 '13 at 13:57

Listen for those guys who suggest you to use a html parser, like beautifulsoup:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('htmlfile', 'r'), 'html')

for br in soup.find_all('br'):
    p = soup.new_tag('p')
    p.string = br.next_sibling.extract()
    br.replace_with(p)

print(soup.prettify())

Run it like:

python3 script.py

That yields:

<html>
 <body>
  <p>
   blah
  </p>
  <p>
   blah blah
  </p>
 </body>
</html>

score 1 · Answer 2 · answered Oct 25 '13 at 13:51

1

You can't do that with regexes, because they can replace only text pieces in place, not propagating results further. All you can do is only some workarounds like this one:

 s = "html code"
 s = s.split("<br />");
 s = "<p>" + "</p><p>".join(s) + "</p>"

answered Oct 25 '13 at 13:51

Roman Dobrovenskii

935
10
23

score 1 · Answer 3 · answered Oct 25 '13 at 14:06

It is simple regular expression, no need for splitting and BeautifulSoup.

import re
t = '(.+)(blah)(.+)(blah blah)'
r = r"""<p>
\2
</p>
<p>
\4
</p>
"""
s = """<br />
blah
<br />
blah blah
"""
print(re.sub(t, r, s, flags=re.S))

It gives

<p>
blah
</p>
<p>
blah blah
</p>

Regex replace before and after text, keep text in place

3 Answers3