4

I have a bit of php that grabs the html from a page and loads it into a simplexml object. However its not getting the classes of the element within a

The php

//load the html page with curl
$html = curl_exec($ch);
curl_close($ch);

$doc = new DOMDocument();
$doc->loadHTML($html);
$sxml = simplexml_import_dom($doc);

The page html. Which if I do a var_dump of $html shows its been scraped and exists in $html

    <li class="large">
        <a style="" id="ref_3" class="off" href="#" onmouseover="highlightme('07');return false;" onclick="req('379');return false;" title="">07</a>
    </li>

The var_dump (below) of $doc and $sxml show that the a class of 'off' is now missing. Unfortunately I need to process the page based on this class.

            [8]=>
             object(SimpleXMLElement)#50 (2) {
              ["@attributes"]=>
              array(1) {
                ["class"]=>
                string(16) "large"
              }
              ["a"]=>
              string(2) "08"
            }
Paul M
  • 3,937
  • 9
  • 45
  • 53

1 Answers1

1

Using simplexml_load_file and xpath, see the inline comments.

What you are after, really, once you found the element you need is this

$row->a->attributes()->class=="off"

And the full code below:

// let's take all the divs that have the class "stff_grid"
$divs = $xml->xpath("//*[@class='stff_grid']");

// for each of these elements, let's print out the value inside the first p tag
foreach($divs as $div){
    print $div->p->a . PHP_EOL;

    // now for each li tag let's print out the contents inside the a tag
    foreach ($div->ul->li as $row){

        // same as before
        print "  - " . $row->a;
        if ($row->a->attributes()->class=="off") print " *off*";
        print PHP_EOL;

        // or shorter
        // print "  - " . $row->a . (($row->a->attributes()->class=="off")?" *off*":"") . PHP_EOL;

    }
}
/* this outputs the following
Person 1
  - 1 hr *off*
  - 2 hr
  - 3 hr *off*
  - 4 hr
  - 5 hr
  - 6 hr *off*
  - 7 hr *off*
  - 8 hr
Person 2
  - 1 hr
  - 2 hr
  - 3 hr
  - 4 hr
  - 5 hr
  - 6 hr
  - 7 hr *off*
  - 8 hr *off*
Person 3
  - 1 hr
  - 2 hr
  - 3 hr
  - 4 hr *off*
  - 5 hr
  - 6 hr
  - 7 hr *off*
  - 8 hr
*/
Alex Andrei
  • 7,315
  • 3
  • 28
  • 42
  • ok this works and doing a var_dump shows that the attributes are indeed there in the object, whereas they are not in my original post which is odd. I will need to work backwards to understand why my original object wasnt loading these attributes. Thanks for all your help, i have created and will award you a bounty for all your help. – Paul M Oct 22 '15 at 18:53
  • You are welcome @PaulM ! but the bounty is not really necessary. – Alex Andrei Oct 22 '15 at 19:00
  • For anyone else that gets this far, It appears just switching from simplexml_load_dom to simplexml_load_file made sure all the attributes on all elements were in the object so I could then access them. – Paul M Oct 23 '15 at 08:45