PHP regex return zero result

Question

i have following PHP curl and regex code. i'd like to get post header from website. In actual, there are 10 articles. but code returns zero result.

PHP:

<?php 
$ch = curl_init();
$url = "www.mahsumakbas.net";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$content = curl_exec($ch);
curl_close($ch);

@preg_match_all('/<h2 class="entry-title">(.*)<\/h2>/' ,$content, $matches); 


for ($i=0;  $i< sizeof($matches[1]); $i++)
    echo $matches[1][$i]."<br/>";

?>

On www.mahsumakbas.net web page there are 10 <h2 class="entry-title"> enclosed with </h2>

what do i miss?

what do you think will happen when they add a simple extra space, or an extra attribute, to those h2's ? http://stackoverflow.com/a/1732454/1067003 — hanshenrik, Jan 07 '17 at 11:43
the tl;dr is: don't use regex to parse html. `$results=[];$domd=@DOMDocument::loadHTML($content);foreach($domd->getElementsByTagName("h2") as $h2){if($h2->getAttribute("class") !=="entry-title")continue;$matches[]=$h2->textContent;}var_dump($matches);` — hanshenrik, Jan 07 '17 at 11:45
`for ($i=0; $i< sizeof($matches[1]); $i++)` is not the way to loop over an array in PHP, use `foreach` instead. — Casimir et Hippolyte, Jan 07 '17 at 12:38

Suchit kumar · Accepted Answer · 2017-01-07T12:50:01.423

1

Try this:

$url = "www.mahsumakbas.net";
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
//curl_setopt(... other options you want...)
$html = curl_exec($c);

curl_close($c);
preg_match_all("'<h2 class=\"entry-title\">(.*?)</h2>'si" ,$html, $matches); 

foreach($matches[1] as $key=>$val)
    echo $val."<br/>";

edited Jan 07 '17 at 12:50

answered Jan 07 '17 at 11:40

Suchit kumar

11,809
3
22
44

@MahsumAkbas welcome.If problem solved you can mark it as answer. – Suchit kumar Jan 10 '17 at 11:36

score 0 · Answer 2 · answered Jan 07 '17 at 11:42

0

Your headlines are build in 3 lines. You must set the "m"-option. Maybe it helps.

http://php.net/manual/en/reference.pcre.pattern.modifiers.php

But for parsing a HTML-DOM string you should use DOMDocument with getElementByTagName

answered Jan 07 '17 at 11:42

Sysix

1,572
1
16
23

actually using an XPath would fit even better than getElementsByTagName , he don't want h2 tags, he want h2 tags that also have the `class="entry-title"` attribute. but yeah – hanshenrik Jan 07 '17 at 11:48
He could also check the class later ;) but your right. XPath is much easier – Sysix Jan 07 '17 at 11:49

PHP regex return zero result

2 Answers2