How can I grab the entire content inside `` tag with regex?

Question

How can I grab the entire content inside <body> tag with regex?

For instance,

<html><body><p><a href="#">xx</a></p>

<p><a href="#">xx</a></p></body></html>

I want to return this only,

<p><a href="#">xx</a></p>

<p><a href="#">xx</a></p>

Or any other better ideas? maybe DOM but I have to use saveHTML(); then it will return doctype and body tag...

HTML Purifier is a pain to use so I decide not to use it. I thought regex could be the next best option for my disaster.

Take a look at this post http://stackoverflow.com/questions/3577641/best-methods-to-parse-html-with-php/3577662#3577662 — Ewout Kleinsmann, Jul 31 '11 at 20:48
Don't use regexes. http://htmlparsing.com/php.html gives you examples of how to use a proper HTML parser. In fact, if you're using simple_html_dom, it's as simple as `file_get_html('http://www.google.com/')->plaintext;` — Andy Lester, Dec 18 '12 at 18:09

Flambino · Accepted Answer · 2011-07-31T20:55:12.767

28

preg_match("/<body[^>]*>(.*?)<\/body>/is", $html, $matches);

$matches[1] will be the contents of the body tag

edited Jul 31 '11 at 20:55

answered Jul 31 '11 at 20:49

Flambino

1

See the valid(!) HTML examples at http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-rege/702222#702222 and see how you fail with a regular expression. – Shi Jul 31 '11 at 21:14

score 1 · Answer 2 · answered Jul 31 '11 at 20:52

1

preg_match("~<body.*?>(.*?)<\/body>~is", $html, $match);
print_r($match);

answered Jul 31 '11 at 20:52

genesis

2 Answers2