-1

I have:

<html>
<head>
    <title>My Page</title>
</head>
<body>
    <p>paragraph 1</p>
    <p>paragraph 2</p>
    <p>paragraph 3</p>
    <p>paragraph 4</p>
    <ul>
        <li>item # 1</li>
        <li>item # 2</li>
        <li>item # 3</li>
        <li>item # 4</li>
    </ul>
    <a href="#">anchor 1</a>
    <a href="#">anchor 2</a>
    <a href="#">anchor 3</a>
    <a href="#">anchor 4</a>
    <div>div # 1</div>
    <div>div # 2</div>
    <div>div # 3</div>
    <div>div # 4</div>
</body>
</html>

I want to be able to extract a specified tag, lets say a div tag, and it's contents.

So far I have

$file = file_get_contents('file.html');
$dom = new DOMDocument();
$dom->loadHTML( $file );
$xpath = new DOMXpath( $dom );
$paragraphs = $xpath->query("/html/body//p");

for( $i = 0; $i < $paragraphs->length; $i++ )
{
     # echo the tag and it's contents
}

I tried using nodeValue or textContent but they just print the content of the tag and not the tags plus their content.

This is my first time using the DOM parser in PHP. I know that the use of regexes to parse HTML/XML is protested against so I am using the DOM parser. Any suggestions would help.

Robert
  • 10,126
  • 19
  • 78
  • 130

1 Answers1

1

This should work for PHP version 5.3.6+. Just pass the node to the DOMDocument::saveHTML function.

for( $i = 0; $i < $paragraphs->length; $i++ )
{
     echo $dom->saveHTML($paragraph->item($i));
}

I hope this helps!

Chip Dean
  • 4,222
  • 3
  • 24
  • 36