18

I'm struggling big time understanding how to use the DOMElement object in PHP. I found this code, but I'm not really sure it's applicable to me:

$dom = new DOMDocument();
$dom->loadHTML("index.php");

$div = $dom->getElementsByTagName('div');
foreach ($div->attributes as $attr) {
     $name = $attr->nodeName;
     $value = $attr->nodeValue;
     echo "Attribute '$name' :: '$value'<br />";
}

Basically what I need is to search the DOM for an element with a particular id, after which point I need to extract a non-standard attribute (i.e. one that I made up and put on with JS) so I can see the value of that. The reason is I need one piece from the $_GET and one piece that is in the HTML based from a redirect. If someone could just explain how I use DOMDocument for this purpose, that would be helpful. I'm really struggling understanding what's going on and how to properly implement it, because I clearly am not doing it right.

EDIT (Where I'm at based on comment):

This is my code lines 4-26 for reference:

<div id="column_profile">
    <?php
        require_once($_SERVER["DOCUMENT_ROOT"] . "/peripheral/profile.php");            
        $searchResults = isset($_GET["s"]) ? performSearch($_GET["s"]) : "";

        $dom = new DOMDocument();
        $dom->load("index.php");

        $divs = $dom->getElementsByTagName('div');
        foreach ($divs as $div) {
            foreach ($div->attributes as $attr) {
              $name = $attr->nodeName;
              $value = $attr->nodeValue;
              echo "Attribute '$name' :: '$value'<br />";
            }
        }
        $div = $dom->getElementById('currentLocation');
        $attr = $div->getAttribute('srckey');   
        echo "<h1>{$attr}</a>";
    ?>
</div>

<div id="column_main">

Here is the error message I'm getting:

Warning: DOMDocument::load() [domdocument.load]: Extra content at the end of the document in ../public_html/index.php, line: 26 in ../public_html/index.php on line 10

Fatal error: Call to a member function getAttribute() on a non-object in ../public_html/index.php on line 21
Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Matt
  • 1,500
  • 8
  • 23
  • 38
  • `index.hp` won't be executed. `loadHTML` just reads the contents of the file, it doesn't run it. You may need to do something like: `$dom->loadHTML(file_get_contents('http://localhost/index.php'))`. – gen_Eric Nov 16 '11 at 14:18

2 Answers2

22

getElementsByTagName returns you a list of elements, so first you need to loop through the elements, then through their attributes.

$divs = $dom->getElementsByTagName('div');
foreach ($divs as $div) {
    foreach ($div->attributes as $attr) {
      $name = $attr->nodeName;
      $value = $attr->nodeValue;
      echo "Attribute '$name' :: '$value'<br />";
    }
}

In your case, you said you needed a specific ID. Those are supposed to be unique, so to do that, you can use (note getElementById might not work unless you call $dom->validate() first):

$div = $dom->getElementById('divID');

Then to get your attribute:

$attr = $div->getAttribute('customAttr');

EDIT: $dom->loadHTML just reads the contents of the file, it doesn't execute them. index.php won't be ran this way. You might have to do something like:

$dom->loadHTML(file_get_contents('http://localhost/index.php'))
gen_Eric
  • 223,194
  • 41
  • 299
  • 337
  • 1
    Does this work if if your HTML does not contain a doctype declaration? A comment on the [DOMDocument::getElementById](http://us3.php.net/manual/en/domdocument.getelementbyid.php) documentation page suggests that if the HTML does not contain a doctype declaration. `getElementById()` always returns `null`. – Jay Bienvenu Jan 08 '18 at 15:44
  • Not sure what that comment is trying to say. `DOMDocument` works just fine on HTML without a ` `. Demo: https://3v4l.org/0mGrg – gen_Eric Jan 08 '18 at 17:56
  • Yes, I'm using the DOM library to compose HTML in that manner. But I'm trying to run tests against the composed HTML. `getElementById()` always returns `null` even when it's clearly in the rendered HTML. – Jay Bienvenu Jan 08 '18 at 18:18
  • I went ahead and composed [my own question](https://stackoverflow.com/questions/48156555/php-document-model-finding-an-element-in-a-composed-html-document). – Jay Bienvenu Jan 08 '18 at 19:09
2

You won't have access to the HTML if the redirect is from an external server. Let me put it this way: the DOM does not exist at the point you are trying to parse it. What you can do is pass the text to a DOM parser and then manipulate the elements that way. Or the better way would be to add it as another GET variable.

EDIT: Are you also aware that the client can change the HTML and have it pass whatever they want? (Using a tool like Firebug)

jakx
  • 748
  • 5
  • 8