Assuming, you want to extract all Anchor
Tags with their hyperlinks from the given page.
Now there are certain problems with doing file_get_contents
on that URL :
- Character encoding for Compression, i.e
gzip
- SSL Verification of the URL.
So, to overcome first problem of gzip
character encoding, we'll use CURL as @gregn3 suggested in his answer. But he missed to use CURL's ability to automatically decompress gzip
ed content.
For second problem, you can either follow this guide or disable SSL verification from CURL's curl_setopt
methods.
Now the code which will extract all the links from the given page is :
<?php
$url = "https://qc.yahoo.com/";
# download resource
$c = curl_init ($url);
curl_setopt($c, CURLOPT_HTTPHEADER, ["Accept-Encoding:gzip"]);
curl_setopt ($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($c, CURLOPT_ENCODING , "gzip");
curl_setopt($c, CURLOPT_VERBOSE, 1);
curl_setopt($c, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($c, CURLOPT_SSL_VERIFYHOST, 0);
$content = curl_exec ($c);
curl_close ($c);
$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);
# output results
echo "url = " . htmlspecialchars ($url) . "<br>";
echo "links found (" . count ($matches[1]) . "):" . "<br>";
$n = 0;
foreach ($matches[1] as $link)
{
$n++;
echo "$n: " . htmlspecialchars ($link) . "<br>";
}
But if you want to do advance html parsing, then you'll need to use PHP Simple HTML Dom Parser. In PHP Simple HTML Dom you can select the div by using jQuery
selectors and fetch the anchor tags
. Here are it's documentation & api manual.