Whats shoud be the regular expression for the following text

Question

<p class="body">
  Giving the meeting of NITI Aayog in New Delhi a miss, West Bengal Chief        Minister and Trinamool Congress chairperson Mamata Banerjee said in Bardhaman on Wednesday that the Centre should withdraw the land acquisition ordinance.
</p>

Just to get the content between <p class="body"> and </p>

re.search(">(.+?)<",text) is returning None

What have you tried so far? What's not working? Also, please realized that regular expressions [are not the best tool](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) for general HTML parsing... — Cameron, Jul 15 '15 at 15:47
I accidentally flagged this as spam. Without the code markup it looked like someone was trying to squeeze in a political statement. Sorry for the mistake. — Bryan Oakley, Jul 15 '15 at 15:47
Hello and welcome to StackOverflow. Please take some time to read the help page, especially the sections named ["What topics can I ask about here?"](http://stackoverflow.com/help/on-topic) and ["What types of questions should I avoid asking?"](http://stackoverflow.com/help/dont-ask). And more importantly, please read [the Stack Overflow question checklist](http://meta.stackexchange.com/q/156810/204922). You might also want to learn about [Minimal, Complete, and Verifiable Examples](http://stackoverflow.com/help/mcve). — Morgan Thrapp, Jul 15 '15 at 15:58
I am not able to understand why is it returning Null value each and every time — chandan sr, Jul 15 '15 at 16:04
I don't know python but is there a modifier so the `.` extends to new lines as well? — chris85, Jul 15 '15 at 16:05
Maybe this would be useful, https://docs.python.org/2/library/re.html#re.S — chris85, Jul 15 '15 at 16:15
I also used re.search(">(.+?)<",str(each),re.M) but same result — chandan sr, Jul 15 '15 at 16:21
also this: val = re.search("""
\n (.+?)\n
""",str(each),re.X) — chandan sr, Jul 15 '15 at 16:26
Thanks guys : re.search(">(.+?)<",str(each),re.DOTALL) worked perfectly for me... — chandan sr, Jul 15 '15 at 16:29

chris85 · Answer 1 · 2015-07-15T18:10:57.927

The . by default doesn't include new lines so your current regex doesn't match.

re.DOTALL
Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

This doc has a list of modifiers that can be used to alter how the regex functions https://docs.python.org/2/library/re.html#re.S.

So the .+? only matches the following without the modifier:

<p class="body">

with the modifier you will get:

<p class="body">
  Giving the meeting of NITI Aayog in New Delhi a miss, West Bengal Chief        Minister and Trinamool Congress chairperson Mamata Banerjee said in Bardhaman on Wednesday that the Centre should withdraw the land acquisition ordinance.
</p>

You should consider using a parser for this though. This regex will fail with any additional elements in your search string, for example:

<p class="body">
  Giving the meeting of <em>ITI</em> Aayog in New Delhi a miss, West Bengal Chief        Minister and Trinamool Congress chairperson Mamata Banerjee said in Bardhaman on Wednesday that the Centre should withdraw the land acquisition ordinance.
</p>

You can see this in action here.

Whats shoud be the regular expression for the following text

1 Answers1