1

I'm trying to count div with class classtosearch in a result from curl.

PIece of code:

<div class="classtosearch other_class">Content</div>
<div class="classtosearch other_class_1">Content</div>
<div class="classto_NOT_search other_class">Content</div>

This is my code:

$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_FAILONERROR, true); 
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_URL,"http://www.example.org");

$buf2 = curl_exec($ch);

curl_close ($ch);


$doc = new DOMDocument();
$doc->loadHTML($buf2);
$xpath = new DOMXPath($doc);
$divs= $xpath->query("//div[@class='classtosearch']");

echo "Found " . $divs->length . " divs";

I get always 0 as result and a lot of warning:

Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity
Warning: DOMDocument::loadHTML(): htmlParseEntityRef: no name in Entity
Warning: DOMDocument::loadHTML(): Tag footer invalid in Entity

But My test page has 2 divs with class classtosearch

Giuseppe Lodi Rizzini
  • 1,045
  • 11
  • 33
  • Have you dumped `$buf2` to assert that it contains what you think it does? – Bananaapple Nov 07 '19 at 09:51
  • Check https://stackoverflow.com/questions/7082401/avoid-domdocument-xml-warnings-in-php see if this helps. – Nigel Ren Nov 07 '19 at 10:01
  • @Bananaapple yes if I print with var_dump($buf2) I get a string variable with all HTML code returned and I see 2 divs needed – Giuseppe Lodi Rizzini Nov 07 '19 at 10:04
  • @NigelRen no problem to suppress warning but... i don't get right result in count....this is the main problem – Giuseppe Lodi Rizzini Nov 07 '19 at 10:06
  • Very difficult to check unless we can see the HTML it's working with (either a segment or just the URL you are using). – Nigel Ren Nov 07 '19 at 10:07
  • @NigelRen ok I've added the piece of code. meanwhile I found the problem but not find solution: $path->query find only class with only this class...if a div contain others classes too, it is not found. – Giuseppe Lodi Rizzini Nov 07 '19 at 10:20
  • 1
    OK - added a duplicate which should sort your problem. Just a case of getting the right XPath – Nigel Ren Nov 07 '19 at 10:22
  • +1 for using XPath, that's the correct way to do it! (most people would use regex instead, which is the wrong way to do it) you need XPath contains() function, ```$the_count=(new DOMXPath(@DOMDocument::loadHTML($html)))->query("//div[contains(@class,'classtosearch')]")->length;``` – hanshenrik Nov 07 '19 at 22:49

0 Answers0