0

I have index.html

<html> <head> bla bla bla </head> <body class="someclass"> bla bla bla </body> </html>

I need get content inside body tag. Tried this

<?php $site = file_get_contents("index.html"); preg_match("/<body[^>]*>(.*?) \/body>/is", $site, $matches); print ($matches[1]); ?>

But it not output to anything. Please tell me problem here. Thank you.

Tran
  • 185
  • 2
  • 4
  • 15
  • *(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon Jul 14 '11 at 12:01

4 Answers4

1
<?php 
$site = file_get_contents("index.html"); 
preg_match("/<body.*?>(.*?)<\/body>/is", $site, $matches); 
print ($matches[1]); 
?>
genesis
  • 50,477
  • 20
  • 96
  • 125
0

It may be not your answer but i recommend you to try php DOMDocument link

0

"/<body[^>]*>(.*?) \/body>/is" Should be "/<body[^>]*>(.*?)<\/body>/is"

adlawson
  • 6,303
  • 1
  • 35
  • 46
0

You should take a look at PHP Simple HTML DOM Parser: http://simplehtmldom.sourceforge.net/

You can get the body with something like this:

$html = file_get_html('index.html')
$body = $html->find('body');

you can then get the inner HTML by:

$content = $body->innertext;
rgubby
  • 1,251
  • 10
  • 8
  • Suggested third party alternatives to [SimpleHtmlDom](http://simplehtmldom.sourceforge.net/) that actually use [libxml](http://xmlsoft.org/html/libxml-HTMLparser.html) instead of String Parsing: [phpQuery](http://code.google.com/p/phpquery/), [Zend_Dom](http://framework.zend.com/manual/en/zend.dom.html), [QueryPath](http://querypath.org/) and [FluentDom](http://www.fluentdom.org). – Gordon Jul 14 '11 at 12:02