-2

There is any way to extract the content of a HTML page that starts from <body> and ends with </body> in php. If there can anyone post some sample code.

Ariful Islam
  • 7,639
  • 7
  • 36
  • 54
bharathi
  • 6,019
  • 23
  • 90
  • 152

3 Answers3

6

You should have a look at the DOMDocument reference.

This example reads a html document, creates a DOMDocument and gets the body tag:

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile('http://example.com');
libxml_use_internal_errors(false);

$body = $dom->getElementsByTagName('body')->item(0);

echo $body->textContent; // print all the text content in the body

You should also check out the following resources:

DOM API Documentation
XPATH language specification

Cyclonecode
  • 29,115
  • 11
  • 72
  • 93
2

Try PHP Simple HTML DOM Parser

$html = file_get_html('http://www.example.com/');
$body = $html->find('body');
Naveed
  • 41,517
  • 32
  • 98
  • 131
1

You can also try to use non-DOM solution based on strpos function:

$html = file_get_contents($url);
$html = substr($html,stripos($html,'<body>')+6);
$html = substr($html,0,strripos($html,'</body>'));

stripos is case insensitive version of strpos, strripos is case insensitive 'rightmost position' version of strpos.

Hope that it will help you!

Vlada Katlinskaya
  • 991
  • 1
  • 10
  • 26