how to use curl and preg_match _all div content

Question

I try to practice CURL,but it doesn't go well Pleasw tell me what's wrong here is my code

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://xxxxxxx.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
curl_setopt($ch, CURLOPT_USERAGENT, "Google Bot");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

$downloaded_page = curl_exec($ch);
curl_close($ch);
preg_match_all('/<div\s* class =\"abc\">(.*)<\/div>/', $downloaded_page, $title); 
echo "<pre>";
print($title[1]);  
echo "</pre>";

and the warning is Notice: Array to string conversion

the html I want to parse is like this

<div class="abc">
<ul> blablabla </ul>
<ul> blablabla </ul>
<ul> blablabla </ul>
</div>

$title is not an array, but array of arrays. Look at the examples on the manual page: http://php.net/manual/en/function.preg-match-all.php — Ashalynd, Oct 13 '13 at 09:50

score 2 · Accepted Answer · answered Oct 13 '13 at 21:30

preg_match_all returns an array of arrays.

If your code is:

preg_match_all('/<div\s+class="abc">(.*)<\/div>/', $downloaded_page, $title);

you actually want to do the following:

echo "<pre>";
foreach ($title[1] as $realtitle) {
    echo $realtitle . "\n";
}
echo "</pre>";

Since it will search all div's that have class "abc". I also suggest you harden your regex to be more robust.

preg_match_all('/<div[^>]+class="abc"[^>]*>(.*)<\/div>/', $downloaded_page, $title);

This will match as well as

BTW: DomDocument is slow as hell, I found out that regexes sometimes (depending on the size of your document) can give 40x speed increase. Just keep it simple.

Best, Nicolas

score 1 · Answer 2 · edited May 23 '17 at 12:30

Don't parse HTML with regex.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.lipsum.com/');
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$html = curl_exec($ch);
curl_close($ch);

$dom = new DOMDocument;
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
# foreach ($xpath->query('//div') as $div) { // all div's in html
foreach ($xpath->query('//div[contains(@class, "abc")]') as $div) { // all div's that have "abc" classname
    // $div->nodeValue contains fetched DIV content
}

how to use curl and preg_match _all div content

2 Answers2