0

I want to get the number of subscribers this channel has with a curl, but seems i get an empty array, any help?

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.youtube.com/channel/UCU3i-l-rqTVGQj3Q3LePhJQ");
curl_setopt($ch, CURLOPT_USERAGENT,"Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1");
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Accept-Language: es-es,en"));
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$result = curl_exec($ch);
//para mostrar posibles error
$error = curl_error($ch);
curl_close($ch);

//parsear

preg_match_all("(<a class=\"secondary-header-action\" href=\"/subscribers\" role=\"menuitem\">
        <span class=\"nav-text\">
          (.*)
        </span>
      </a>)siU", $result, $matches);

print_r($matches);
Sociopath
  • 295
  • 2
  • 14

1 Answers1

1

When parsing HTML, the safest way is to use an HTML DOM parser. Here is an example code that takes in $result HTML string and gets all texts inside span tag with nav-text class inside a tag with secondary-header-action class:

$result = <<<DATA
<body>
<a class="secondary-header-action" href="/subscribers" role="menuitem">
<span class="nav-text">Some text here</span>
</a>
</body>
DATA;

$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($result, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);
$atags = $xpath->query('//a[@class="secondary-header-action"]/span[@class="nav-text"]');
$res = array();

foreach($atags as $a) { 
   array_push($res, $a->nodeValue);
}

print_r($res); // => Array ( [0] => Some text here )

See the PHP demo

The DOM is initialized with DOMDocument and the DOMXPath helps access the necessary elements in the DOM tree with xpath expressions.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563