0

Hi everyone i'm noob at this,

https://decisoesesolucoes.com/agencias/albergaria/consultores

In the url above, i want to count the number of 'consultor imobiliario' and 'Consultora Imobiliaria' , both the text has spaces, so why im using the normalize-space .

The text i want to get

Example:

"//*[text()[normalize-space() = 'consultor imobiliario']]" - this works

But if i want to count also the 'Consultora Imobiliaria' doesn't work:

"//*[text()[normalize-space() = 'consultor imobiliario' and 'Consultora Imobiliária']]"  

(if I user OR instead AND the counting = bad count)

My intire code is :

$current_page = 1;
$max_page = 999999999999;
$countTotalConsultores=0;

while($max_page >= $current_page){

$url = "https://decisoesesolucoes.com/agencias/albergaria/consultores?page=";
$url .= $current_page;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

$res = curl_exec($ch);

curl_close($ch);

$dom = new DomDocument();
@ $dom->loadHTML($res);

$xpath = new DOMXpath($dom);
$body = $xpath->query("//*[text()[normalize-space() = 'consultor imobiliario' and 'Consultora Imobiliária']]"); 
$count = $body->length;

$countTotalConsultores = $countTotalConsultores+$count;

echo "        Página atual:" .$current_page . "No. of agents " . $countTotalConsultores;

$current_page = $current_page+1;

if ($count < 1){
    break;

Anyone can help me please?

Zuclaspt
  • 3
  • 3
  • 2
    If you only want to count the occurence, why not use `substr_count()` on `$res`? – Michel Aug 14 '21 at 13:20
  • @Michel , PHP Warning: substr_count() expects at least 2 parameters, 1 given in /volume1/www/iad/tools/curling_decisoes_solucoes_consultores.php on line 28 – Zuclaspt Aug 14 '21 at 13:29
  • Can you help me with the code? – Zuclaspt Aug 14 '21 at 13:29
  • @Michel in practice that should work fine. in theory it's subject to issues like "[parsing HTML with regex](https://stackoverflow.com/a/1732454/1067003)", eg it would incorrectly count `````` instances (in practice there won't be any such instances, it's a purely theoretical issue :P ) – hanshenrik Aug 16 '21 at 12:06

2 Answers2

1

EDITED:

You are trying to find text-nodes that are both equal to have 2 different values. That wil never match anything. It is like saying give me all days in summer that are both 100% sunny and 100% rainy.

Use or in stead of and like this:

"//*[text()[normalize-space() = 'consultor imobiliario' or normalize-space() ='Consultora Imobiliária']]"  
Siebe Jongebloed
  • 3,906
  • 2
  • 14
  • 19
  • that will return everything because the string literal `Consultora Imobiliária` is true-ish, you probably meant to write ```"//*[text()[normalize-space() = 'consultor imobiliario' or normalize-space() = 'Consultora Imobiliária']]"``` - notably you inherited that bug from copy-pasting OP's bugged xpath, though ^^ OP also has the same bug – hanshenrik Aug 15 '21 at 17:31
  • Thanks for the help Siege Jongebloed , however de code it counts only the 'consultor imobiliario' . The 'Consultora Imobiliária' text is not counting. Will it be because of having accents? – Zuclaspt Aug 16 '21 at 01:02
  • If i put this "//*[text()[normalize-space() = 'consultor imobiliario' or 'Consultora Imobiliária']]" the count will count millions ... And the result should be like 7 ... – Zuclaspt Aug 16 '21 at 01:12
  • @hanshenrik: you are correct. I have updated the answer – Siebe Jongebloed Aug 16 '21 at 09:16
  • @Zuclaspt yeah that was caused by the bug i mentioned above. the xpath has been updated, try the updated version ^^ – hanshenrik Aug 16 '21 at 09:54
  • @hanshenrik, i've put $tables = $xpath->query("//*[text()[normalize-space() = 'consultor imobiliario' or normalize-space() ='Consultora Imobiliária']]"); And only count's 3 agents instead counting 5 agents ... Only the node 'consultor imobiliario' its counting :( I've to found a solution – Zuclaspt Aug 16 '21 at 11:00
  • @Zuclaspt sigh, that's because it looks for a node containing those 2 exact strings *and nothing else*, one of the tags you're looking for has the text ` Consultora Imobiliária e Financeira`, which is not *just*`Consultora Imobiliária`, seems you need contains() – hanshenrik Aug 16 '21 at 11:22
  • @hanshenrik, with this code, how I want to add various url's for the count,, how i can add multiple url's ? Thanks for all the help! – Zuclaspt Aug 16 '21 at 13:45
  • @Zuclaspt sounds like a job for array() + foreach(...) + [xpath_quote()](https://gist.github.com/divinity76/64b0c12bcafc2150efa8ca87d2ccee52) + string concatenation, eg ```$xp=""//*[text()[";foreach($needles as $needle){$xp.="normalize-space() = ".xpath_quote($needle)." OR"; }``` ~~ – hanshenrik Aug 16 '21 at 14:51
  • @hanshenrik , sorry im noob at this... How that code can seek for multiple url's? In my code I had: $url = "https://decisoesesolucoes.com/agencias/albergaria/consultores?page="; However I want to seek also in "https://decisoesesolucoes.com/agencias/ABRANTES/consultores?page="; – Zuclaspt Aug 16 '21 at 17:57
  • @Zuclaspt that's irrelevant for this thread, if you're stuck on that, perhaps make a new thread about that problem. – hanshenrik Aug 16 '21 at 18:09
  • i've just opened another thread.. Thanks https://stackoverflow.com/questions/68807538/xpath-multiple-urls – Zuclaspt Aug 16 '21 at 18:20
0

i think you're looking for

"//*[text()[contains(normalize-space(), 'consultor imobiliario') or contains(normalize-space(),'Consultora Imobil')]]"
hanshenrik
  • 19,904
  • 4
  • 43
  • 89