PHP extract html tag and contents

Question

I have:

<html>
<head>
    <title>My Page</title>
</head>
<body>
    <p>paragraph 1</p>
    <p>paragraph 2</p>
    <p>paragraph 3</p>
    <p>paragraph 4</p>
    <ul>
        <li>item # 1</li>
        <li>item # 2</li>
        <li>item # 3</li>
        <li>item # 4</li>
    </ul>
    <a href="#">anchor 1</a>
    <a href="#">anchor 2</a>
    <a href="#">anchor 3</a>
    <a href="#">anchor 4</a>
    <div>div # 1</div>
    <div>div # 2</div>
    <div>div # 3</div>
    <div>div # 4</div>
</body>
</html>

I want to be able to extract a specified tag, lets say a div tag, and it's contents.

So far I have

$file = file_get_contents('file.html');
$dom = new DOMDocument();
$dom->loadHTML( $file );
$xpath = new DOMXpath( $dom );
$paragraphs = $xpath->query("/html/body//p");

for( $i = 0; $i < $paragraphs->length; $i++ )
{
     # echo the tag and it's contents
}

I tried using nodeValue or textContent but they just print the content of the tag and not the tags plus their content.

This is my first time using the DOM parser in PHP. I know that the use of regexes to parse HTML/XML is protested against so I am using the DOM parser. Any suggestions would help.

score 1 · Accepted Answer · answered Apr 05 '15 at 22:28

1

This should work for PHP version 5.3.6+. Just pass the node to the DOMDocument::saveHTML function.

for( $i = 0; $i < $paragraphs->length; $i++ )
{
     echo $dom->saveHTML($paragraph->item($i));
}

I hope this helps!

answered Apr 05 '15 at 22:28

Chip Dean

4,222
3
24
36

simple and small. just what I needed – Robert Apr 05 '15 at 22:37
`foreach($paragraphs as $pargraph) {` is also simpler. – chris85 Apr 05 '15 at 22:48
why php 5.3.6+ only? – Robert Apr 06 '15 at 01:35

PHP extract html tag and contents

1 Answers1