-1

I am using curl for web page scraping and I can display a result of interest.

Normally the script below outputs me the WEB SCRAPER TESTING GROUND text which is scraped and regex'ed by "title" id from the page.

Now I would like to check if the word "TESTING" is present in the $list array. If yes - just echo "present", if not - echo "not present". What is the best way to do this?

I know how to search a web page and extract text parts from it.

 $curl = curl_init('http://testing-ground.scraping.pro/textlist'); // cURL 
 setup

curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE); //  return the transfer 
page as a string
curl_setopt($curl, CURLOPT_HEADER, TRUE);


$page = curl_exec($curl); // executing the request

if(curl_errno($curl)) // check for execution errors
{
    echo 'Scraper error: ' . curl_error($curl);
    exit;
}

curl_close($curl); // closing the connection

$regex = '/<div id="title">(.*?)<\/div>/s'; // extracting the needed part

if ( preg_match($regex, $page, $list) ) // search matches of $page with 
$regex
    echo $list[0];
else
    print "Not found";

1 Answers1

0

I know how to search a web page and extract text parts from it.

actually, you're doing it very wrong. the code should actually look something like

$list[]=(@DOMDocument::loadHTML($page))->getElementById("title")->textContent;

if you want to learn how to properly parse HTML in PHP, read the thread > How do you parse and process HTML/XML in PHP?

Now I would like to check if the word "TESTING" is present in the $list array. If yes - just echo "present", if not - echo "not present". What is the best way to do this?

make a bool found, iterate it with foreach, check each entry in the list with strpos(), make sure to break of the loop out early if you find it (because continuing the loop would be a waste of cpu and time after a match is found), and finally print the result, eg

$found=false;
foreach($list as $foo){
    if(false!==strpos("TESTING",$foo)){
        $found=true;
        break;
    }
}
if($found){
    echo "present";
}else{
    echo "not present";
}
hanshenrik
  • 19,904
  • 4
  • 43
  • 89