1

Possible Duplicate:
Export particular element in DOMDocument to string

i know how to access different element depending on id but don't know how to get everything between html start tag to html end tag. Can anyone please help me. thanks.

Community
  • 1
  • 1
user1825190
  • 117
  • 1
  • 11
  • 2
    Your question is unclear. What is your input and desired output? Do you simply want to serialize a DOMDocument to a string? – Francis Avila Dec 29 '12 at 16:52
  • input would be for example, www.hello.php. this url. and that will return everything between including tags like img, div, p – user1825190 Dec 29 '12 at 17:37
  • So you have an html string or url, and you want to return the children of `html` as a string, or as DOM objects, or something else? (You realize it's possible to have valid html that doesn't have the `html` element?) – Francis Avila Dec 29 '12 at 18:11
  • i know what you mean. i will give you an example what i am really after. here is the address of an accessibility checker site. valet.webthing.com/page....... if you just add www.drumstudio.ie... you will see it produce all the code that is written in the "drumstudio" page. i am trying to do the same thing as that. hope it makes it bit clearer. thanks – user1825190 Dec 29 '12 at 18:17

2 Answers2

1

If you would like to parse an html page with PHP, you could use PHP's DOMDocument extension, as such:

// a new dom object
$dom = new domDocument;
// load the html into the object
$dom->loadHTML($html);
// keep white space
$dom->preserveWhiteSpace = true;
// nicely format output
$dom ->formatOutput   = true;
//get element by tag name
$htmlRootElement = $dom->getElementsByTagName('html');
echo htmlspecialchars($dom->saveHTML(), ENT_QUOTES);

Or you could do this with JavaScript on the client side:

var htmlRootElement = document.getElementsByTagName("html");
alert(htmlRootElement.innerHTML);
Danilo Radenovic
  • 1,019
  • 1
  • 11
  • 23
  • i just tried your code there. I got everything in between tags. what i mean is if there is

    hello DOM

    then i got "hello DOM". but what if i want to get the whole line including the h1 tags. This will be true for everything all the divs, all the img tags. if you know what i mean. thanks
    – user1825190 Dec 29 '12 at 17:12
  • I see. Could you please try to replace the last line with this one echo DOMinnerHTML($htmlRootElement); . I have updated the answer. Or this one echo $dom->saveHTML(); – Danilo Radenovic Dec 29 '12 at 17:25
  • i replaced this line, echo ($htmlRootElement->item(0)->nodeValue); with this line, echo DOMinnerHTML($htmlRootElement); its giving me an error. Call to undefined function DOMinnerHTML() – user1825190 Dec 29 '12 at 17:33
  • nope its giving me an error, Call to undefined function DOMinnerHTML() – user1825190 Dec 29 '12 at 17:39
  • My bad, please try the other one instead - echo $dom->saveHTML(); – Danilo Radenovic Dec 29 '12 at 17:40
  • i will give you an example what i am really after. here is the address of an accessibility checker site. http://valet.webthing.com/page/....... if you just add www.drumstudio.ie... you will see it produce all the code that is written in the "drumstudio" page. i am trying to do the same thing as that. hope it makes it bit clearer. thanks – user1825190 Dec 29 '12 at 17:44
  • Oh, ok. I have updated it again. Please try with this code. – Danilo Radenovic Dec 29 '12 at 17:55
  • thanks a lot mike. i got everything from the page but its all together. i mean one p tag is just after another not in the next line. if you know what i mean. any idea how to place them properly. and thanks again. that was brilliant. :) – user1825190 Dec 29 '12 at 18:03
  • Yes, just remove the line $dom->preserveWhiteSpace = false; . I have just updated the answer. – Danilo Radenovic Dec 29 '12 at 18:11
  • nope sorry did not do anything. – user1825190 Dec 29 '12 at 18:15
  • No, no, i meant now (updated the answer :-) ) $doc->formatOutput = true; – Danilo Radenovic Dec 29 '12 at 18:29
  • nope nothing. still the same, this is what im trying............. $dom = new domDocument; // load the html into the object $dom->loadHTMLFile('http://www.drumstudio.ie'); // keep white space $dom->preserveWhiteSpace = true; // nicely format output $dom->formatOutput = true; //get element by tag name $htmlRootElement = $dom->getElementsByTagName('html'); echo htmlspecialchars($dom->saveHTML(), ENT_QUOTES); – user1825190 Dec 29 '12 at 18:41
  • hi mike thanks for all your help, i have added one thing then it works the way i want it to be....................just added
                  $dom = new domDocument;
    // load the html into the object
    $dom->loadHTMLFile('http://www.drumstudio.ie');
    // keep white space
    $dom->preserveWhiteSpace = true;
    // nicely format output
    $dom->formatOutput   = true;
    //get element by tag name
    $htmlRootElement = $dom->getElementsByTagName('html');
    $new = htmlspecialchars($dom->saveHTML(), ENT_QUOTES);
    echo '
    ' .$new. '
    ';
    – user1825190 Dec 29 '12 at 23:44
  • Wow, great! I have been trying to figure it out and eventually gave up ... Thanks for the update! – Danilo Radenovic Dec 30 '12 at 08:33
1

You can access each element in the <html> tag with the DOMDocument class.

Example

$htmlDoc = new DOMDocument;

$html = <<<HTML
<!doctype html>
<html>
<head>
    <meta charset="utf-8">
    <title>My Site</title>
    <meta name="description" content="DOM test">
</head>
<body>
    <h1>Hello</h1>
    <p>This is a DOM test</p>
</body>
</html>
HTML;

$htmlDoc->loadHTML($html);
$htmlElement = $htmlDoc->getElementsByTagName("html");

foreach ($htmlElement->item(0)->childNodes as $element) {
    echo 'Element name: ' . $element->nodeName . PHP_EOL;
    echo 'Element value: '. $element->nodeValue . PHP_EOL;
}
Adam Elsodaney
  • 7,722
  • 6
  • 39
  • 65
  • i tried something like this but getting error. Fatal error: Cannot use object of type DOMNodeList as array $htmlDoc = new DOMDocument; $htmlDoc->loadHTML("http://www.drumstudio.ie/index.html"); $htmlElement = $htmlDoc->getElementsByTagName("html"); foreach ($htmlElement[0]->childNodes as $element) { echo 'Element name: ' . $element->nodeName . PHP_EOL; echo 'Element value: '. $element->nodeValue . PHP_EOL; } – user1825190 Dec 29 '12 at 17:26
  • Whoops, forgot to use `item()` method to access first element (There's only one). I know you've already accepted another answer but I thought I'd fix this one ;) – Adam Elsodaney Dec 30 '12 at 15:16