6

What is the right syntax to use xpath to get the contents of all divs with a certain class? i seem to be getting the divs but i don't know how to get their innerHTML.

        $url = "http://www.vanityfair.com/politics/2012/10/michael-lewis-profile-barack-obama";

    $ctx     = stream_context_create(array('http'=> array('timeout' => 10)));

    libxml_use_internal_errors(TRUE);
    $num = 0;

    if($html = @file_get_contents($url,false,$ctx)){

        $doc   = DOMDocument::loadHTML($html);
        $xpath = new DOMXPath($doc);

        foreach($xpath->query('//div[@class="page-display"]') as $div){
            $num++;
            echo "$num. ";

            //????

            echo "<br/>";
        }

        echo "<br/>FINISHED";

    }else{
        echo "FAIL";
    }
David
  • 10,418
  • 17
  • 72
  • 122

1 Answers1

8

There is no HTML in the class="page-display" divs - so you're not going to get anything at all.

Do you mean the get class="parbase cn_text"?

    foreach($xpath->query('//div[@class="parbase cn_text"]') as $div){
        $num++;
        echo "$num. ";

        //????
        echo $div->textContent;

        echo "<br/>";
    }
Robbie
  • 17,605
  • 4
  • 35
  • 72
  • wow, i'm stupid. That works, but how come the class "body" doesn't work? – David Sep 21 '12 at 02:32
  • you mean "body "? ("body[space]") – Robbie Sep 21 '12 at 02:37
  • oh dang, spaces count? that's annoying. So if you had 4 divs with classes 1. "body test" 2. "test body example" 3. "example test body" 4. "body" you would need to query on "body ", " body ", " body", and "body"? – David Sep 21 '12 at 02:41
  • And any combination that mixed with "body" - so best to use regular expressions as http://arr.gr/blog/2010/04/xpath-regular-expression-matching-in-php-5-3/ suggests you can. Not tried it, though. Another example: http://stackoverflow.com/questions/10335736/xpath-query-with-regex – Robbie Sep 21 '12 at 02:50