2

Possible Duplicate:
PHP DOMDocument - get html source of BODY

I have the following code as a variable and trying to grab everything in between the body tags (while keeping the p tags etc). Whats the best way of doing this?

  • pregmatch
  • strpos / substr

    <head>
    <title></title>
    </head>
    <body>
        <p>Services Calls2</p>
    </body>
    
Community
  • 1
  • 1
user112570
  • 35
  • 2
  • 7

3 Answers3

4

Neither. You can use a XML parser, like DomDocument:

$dom = new DOMDocument();
$dom->loadHTML($var);

$body = $dom->getElementsByTagName('body')->item(0);

$content = '';

foreach($body->childNodes as $child)
  $content .= $dom->saveXML($child);
insertusernamehere
  • 23,204
  • 9
  • 87
  • 126
nice ass
  • 16,471
  • 7
  • 50
  • 89
  • This works well but i get added the the start and end of the new variable. how do i remove that also. – user112570 Jan 26 '13 at 13:17
  • This will not be recursive throughout the , it will only fetch the first level of tags. (not reason for -1 though, still the right way to go, just needs some more code :) – Lilleman Jan 26 '13 at 13:18
  • I think this is a quite slow way of getting a small data since DomDocument uses DOM Parser which reads the whole document first. "Represents an entire HTML or XML document;" From doc – Jason Jan 26 '13 at 13:21
  • @Lilleman: That's not true, contents of the tags are fetched as well – nice ass Jan 26 '13 at 13:21
  • @user112570: that appears to be part of the new line character on Windows. Try normalizing new lines from the string before passing it to the parser, like `$var = str_replace("\r\n", "\n", $var);` – nice ass Jan 26 '13 at 13:33
  • +1 for introducing dom into it, I did something similar, created a different class for the same – Zaffar Saffee Jan 26 '13 at 14:50
1

Try this, $html has the text:

$s = strpos($html, '<body>') + strlen('<body>');
$f = '</body>';

echo trim(substr($html, $s, strpos($html, $f) - $s));
Govil
  • 2,034
  • 20
  • 20
0

I recommend you to use preg_match because contents between <p>Services Calls2</p> can change all the time then subtr or strpos is going to require quite controversial code.

Example:

$a = '<h2><p>Services Calls2</p></h2>';
preg_match("/<p>(?:\w|\s|\d)+<\/p>/", $a, $ar);
var_dump($ar);

The regex is going to allow alphabets, space and digits only.

Jason
  • 1,298
  • 1
  • 16
  • 27