13

How can I grab the entire content inside <body> tag with regex?

For instance,

<html><body><p><a href="#">xx</a></p>

<p><a href="#">xx</a></p></body></html> 

I want to return this only,

<p><a href="#">xx</a></p>

<p><a href="#">xx</a></p>

Or any other better ideas? maybe DOM but I have to use saveHTML(); then it will return doctype and body tag...

HTML Purifier is a pain to use so I decide not to use it. I thought regex could be the next best option for my disaster.

Community
  • 1
  • 1
Run
  • 54,938
  • 169
  • 450
  • 748

2 Answers2

28
preg_match("/<body[^>]*>(.*?)<\/body>/is", $html, $matches);

$matches[1] will be the contents of the body tag

Flambino
  • 18,507
  • 2
  • 39
  • 58
  • 1
    See the valid(!) HTML examples at http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-rege/702222#702222 and see how you fail with a regular expression. – Shi Jul 31 '11 at 21:14
1
preg_match("~<body.*?>(.*?)<\/body>~is", $html, $match);
print_r($match);
genesis
  • 50,477
  • 20
  • 96
  • 125