2
        $html = new DOMDocument();
        @$html->loadHtmlFile($url);
        $xpath = new DOMXPath( $html );
        //Query to pull all reviews on the page
        $q="//div[starts-with(@id,empReview_)]/h3/meta[1]/@content";
        $nodelist = $xpath->query($q);

        foreach ($nodelist as $n){
            echo $n->nodeValue;
            echo"<br><br>";

        }

Is the query I'm attempting to run on the following XML:

<div id="empReview_2055942" class="employerReview" itemscope="" itemtype="http://schema.org/Review">
    <h2 class="summary">
    <h3 class="review-microdata-heading">
        <span class="gdRatingStars"> </span>
        Former 
        <span itemprop="author">IT Engineer Intern in Santa Clarita, CA</span>
        <meta content="4" itemprop="reviewRating"/>

It goes right to the element when using Firepath, but is not echoing the value via my query in php.

Any help would be greatly appreciated.

Ripon Al Wasim
  • 36,924
  • 42
  • 155
  • 176

1 Answers1

0

Your xpath query is wrong, it should be:

$q="//div[starts-with(@id,empReview_)]/h2/h3/meta[1]/@content";

(You are missing 'h2'). Your code should look like this...

$html = new DOMDocument();

$html->loadHtml('
  <div id="empReview_2055942" class="employerReview" itemscope="" itemtype="http://schema.org/Review">
      <h2 class="summary">
      <h3 class="review-microdata-heading">
          <span class="gdRatingStars"> </span>
          Former 
          <span itemprop="author">IT Engineer Intern in Santa Clarita, CA</span>
          <meta content="4" itemprop="reviewRating"/>
      </h3> <!-- presumably your input has closing tags -->
      </h2>
  </div>
  <div id="empReview_2055947" class="employerReview" itemscope="" itemtype="http://schema.org/Review">
      <h2 class="summary">
      <h3 class="review-microdata-heading">
          <span class="gdRatingStars"> </span>
          Former 
          <span itemprop="author">Some Other Random Thing</span>
          <meta content="7" itemprop="reviewRating"/>
      </h3>
      </h2>
  </div>
');

$xpath = new DOMXPath( $html );
//Query to pull all reviews on the page
$q="//div[starts-with(@id,empReview_)]/h2/h3/meta[1]/@content";
$nodelist = $xpath->query($q);

foreach ($nodelist as $n){
  echo $n->nodeValue;
  echo"<br><br>\n";
}

I get the following output:

4<br><br>
7<br><br>
Jacob
  • 805
  • 9
  • 10
  • Just kidding - the html you posted wasn't correct (it looks like you are scraping glassdoor.com?). The h3 isn't nested inside the h2 as it appears to be in the HTML you posted. So your original query works fine for me. I get this result: 5

    5

    4

    4

    4

    4

    4

    4

    4

    4

    For this page: http://www.glassdoor.com/Reviews/Cisco-Systems-Reviews-E1425.htm See: http://verde.sheckel.net/test2.php.txt , result: http://verde.sheckel.net/test2.php
    – Jacob Nov 09 '12 at 07:54