-3
<p class="body">
  Giving the meeting of NITI Aayog in New Delhi a miss, West Bengal Chief        Minister and Trinamool Congress chairperson Mamata Banerjee said in Bardhaman on Wednesday that the Centre should withdraw the land acquisition ordinance.
</p>

Just to get the content between <p class="body"> and </p>

re.search(">(.+?)<",text) is returning None

abelenky
  • 63,815
  • 23
  • 109
  • 159
chandan sr
  • 31
  • 4
  • 3
    What have you tried so far? What's not working? Also, please realized that regular expressions [are not the best tool](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) for general HTML parsing... – Cameron Jul 15 '15 at 15:47
  • I accidentally flagged this as spam. Without the code markup it looked like someone was trying to squeeze in a political statement. Sorry for the mistake. – Bryan Oakley Jul 15 '15 at 15:47
  • Hello and welcome to StackOverflow. Please take some time to read the help page, especially the sections named ["What topics can I ask about here?"](http://stackoverflow.com/help/on-topic) and ["What types of questions should I avoid asking?"](http://stackoverflow.com/help/dont-ask). And more importantly, please read [the Stack Overflow question checklist](http://meta.stackexchange.com/q/156810/204922). You might also want to learn about [Minimal, Complete, and Verifiable Examples](http://stackoverflow.com/help/mcve). – Morgan Thrapp Jul 15 '15 at 15:58
  • re.search("

    (.+?)

    ",text)
    – chandan sr Jul 15 '15 at 16:03
  • I am not able to understand why is it returning Null value each and every time – chandan sr Jul 15 '15 at 16:04
  • I don't know python but is there a modifier so the `.` extends to new lines as well? – chris85 Jul 15 '15 at 16:05
  • For what I know '.' will not represent new line – chandan sr Jul 15 '15 at 16:08
  • Maybe this would be useful, https://docs.python.org/2/library/re.html#re.S – chris85 Jul 15 '15 at 16:15
  • I also used re.search(">(.+?)<",str(each),re.M) but same result – chandan sr Jul 15 '15 at 16:21
  • also this: val = re.search("""

    \n (.+?)\n

    """,str(each),re.X)
    – chandan sr Jul 15 '15 at 16:26
  • Thanks guys : re.search(">(.+?)<",str(each),re.DOTALL) worked perfectly for me... – chandan sr Jul 15 '15 at 16:29

1 Answers1

0

The . by default doesn't include new lines so your current regex doesn't match.

re.DOTALL
Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

This doc has a list of modifiers that can be used to alter how the regex functions https://docs.python.org/2/library/re.html#re.S.

So the .+? only matches the following without the modifier:

<p class="body">

with the modifier you will get:

<p class="body">
  Giving the meeting of NITI Aayog in New Delhi a miss, West Bengal Chief        Minister and Trinamool Congress chairperson Mamata Banerjee said in Bardhaman on Wednesday that the Centre should withdraw the land acquisition ordinance.
</p>

You should consider using a parser for this though. This regex will fail with any additional elements in your search string, for example:

<p class="body">
  Giving the meeting of <em>ITI</em> Aayog in New Delhi a miss, West Bengal Chief        Minister and Trinamool Congress chairperson Mamata Banerjee said in Bardhaman on Wednesday that the Centre should withdraw the land acquisition ordinance.
</p>

You can see this in action here.

chris85
  • 23,846
  • 7
  • 34
  • 51