Curl and preg_match_all error

Question

I want to get the number of subscribers this channel has with a curl, but seems i get an empty array, any help?

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.youtube.com/channel/UCU3i-l-rqTVGQj3Q3LePhJQ");
curl_setopt($ch, CURLOPT_USERAGENT,"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1");
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Accept-Language: es-es,en"));
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$result = curl_exec($ch);
//para mostrar posibles error
$error = curl_error($ch);
curl_close($ch);

//parsear

preg_match_all("(<a class=\"secondary-header-action\" href=\"/subscribers\" role=\"menuitem\">
        <span class=\"nav-text\">
          (.*)
        </span>
      </a>)siU", $result, $matches);

print_r($matches);

Hi, i'm just trying to learn to use curls, this is really an example trying to extract data from youtube — Sociopath, Dec 19 '16 at 20:31
Then you'd better use DOMDocument and get the value you need with it. Your regex most probably does not work because of the spacing between `>` and `<` that is meaningful — Wiktor Stribiżew, Dec 19 '16 at 20:32
@Sociopath well did you check whether the curl call worked? Really just use the api. — PeeHaa, Dec 19 '16 at 20:35
U mean Api from youtube? this is just an example, i want to learn how to extract information from websites using curls, doesn't matter if its youtube or other website... — Sociopath, Dec 19 '16 at 20:38

score 1 · Accepted Answer · answered Dec 19 '16 at 20:41

When parsing HTML, the safest way is to use an HTML DOM parser. Here is an example code that takes in $result HTML string and gets all texts inside span tag with nav-text class inside a tag with secondary-header-action class:

$result = <<<DATA
<body>
<a class="secondary-header-action" href="/subscribers" role="menuitem">
<span class="nav-text">Some text here</span>
</a>
</body>
DATA;

$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($result, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);
$atags = $xpath->query('//a[@class="secondary-header-action"]/span[@class="nav-text"]');
$res = array();

foreach($atags as $a) { 
   array_push($res, $a->nodeValue);
}

print_r($res); // => Array ( [0] => Some text here )

See the PHP demo

The DOM is initialized with DOMDocument and the DOMXPath helps access the necessary elements in the DOM tree with xpath expressions.

Curl and preg_match_all error

1 Answers1