preg_match problem

Question

I have index.html

<html> <head> bla bla bla </head> <body class="someclass"> bla bla bla </body> </html>

I need get content inside body tag. Tried this

<?php $site = file_get_contents("index.html"); preg_match("/<body[^>]*>(.*?) \/body>/is", $site, $matches); print ($matches[1]); ?>

But it not output to anything. Please tell me problem here. Thank you.

*(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) — Gordon, Jul 14 '11 at 12:01

score 1 · Answer 1 · answered Jul 14 '11 at 10:51

1

<?php 
$site = file_get_contents("index.html"); 
preg_match("/<body.*?>(.*?)<\/body>/is", $site, $matches); 
print ($matches[1]); 
?>

answered Jul 14 '11 at 10:51

genesis

score 0 · Answer 2 · answered Jul 14 '11 at 10:52

0

It may be not your answer but i recommend you to try php DOMDocument link

answered Jul 14 '11 at 10:52

score 0 · Answer 3 · answered Jul 14 '11 at 10:52

0

"/<body[^>]*>(.*?) \/body>/is" Should be "/<body[^>]*>(.*?)<\/body>/is"

answered Jul 14 '11 at 10:52

adlawson

score 0 · Answer 4 · answered Jul 14 '11 at 11:04

0

You should take a look at PHP Simple HTML DOM Parser: http://simplehtmldom.sourceforge.net/

You can get the body with something like this:

$html = file_get_html('index.html')
$body = $html->find('body');

you can then get the inner HTML by:

$content = $body->innertext;

answered Jul 14 '11 at 11:04

rgubby

Suggested third party alternatives to [SimpleHtmlDom](http://simplehtmldom.sourceforge.net/) that actually use [libxml](http://xmlsoft.org/html/libxml-HTMLparser.html) instead of String Parsing: [phpQuery](http://code.google.com/p/phpquery/), [Zend_Dom](http://framework.zend.com/manual/en/zend.dom.html), [QueryPath](http://querypath.org/) and [FluentDom](http://www.fluentdom.org). – Gordon Jul 14 '11 at 12:02

4 Answers4