1

Good day. Little problem with regexp.

I have a regexp that look like

rexp2 = re.findall(r'<p>(.*?)</p>', data)

And i need to grab all in

<div id="header">
<h1></h1>
<p>
localhost OpenWrt Backfire<br />
Load: 0.00 0.00 0.00<br />
Hostname: localhost
</p>
</div>

But my code doesnt work :( What im doing wrong?

Alexander
  • 31
  • 1
  • 2
  • 5

4 Answers4

4

Statutory Warning: It is a Bad Idea to parse (X)HTML using regular expression.

Fortunately there is a better way. To get going, first install the BeautifulSoup module. Next, read up on the documentation. Third, code!

Here is one way to do what you are trying to do:

from BeautifulSoup import BeautifulSoup
html = """<div id="header">
<h1></h1>
<p>
localhost OpenWrt Backfire<br />
Load: 0.00 0.00 0.00<br />
Hostname: localhost
</p>
</div>"""
soup = BeautifulSoup(html)
for each in soup.findAll(name = 'p'):
    print each
Community
  • 1
  • 1
Manoj Govindan
  • 72,339
  • 21
  • 134
  • 141
1

I wouldn't recommend using regular expressions this way. Try parsing HTML with Beautiful Soup instead and walk the DOM tree.

duffymo
  • 305,152
  • 44
  • 369
  • 561
0

dot is not mathching enter, use re.DOTALL

re.findall(r'<p>(.*?)</p>', data, re.DOTALL)
jcubic
  • 61,973
  • 54
  • 229
  • 402
0

You need to specify re.M (multiline) flag to match multiline strings. But parsing HTML with regexps isn't a particularly good idea.

It looks like you want some stats from an OpenWrt-powered router. Why don't you write simple CGI script that outputs required information in machine-readable format?

rkhayrov
  • 10,040
  • 2
  • 35
  • 40