I'm parsing through an HTML page and scraping the links:
function get_links($url) {
// Create a new DOM Document to hold our webpage structure
$xml = new DOMDocument();
// Load the url's contents into the DOM
$xml->loadHTMLFile($url);
// Empty array to hold all links to return
$links = array();
//Loop through each <a> tag in the dom and add it to the link array
foreach($xml->getElementsByTagName('a') as $link) {
$links[] = array('url' => $link->getAttribute('href'));
}
//Return the links
return $links;
}
$arrayLinks = get_links($url);
The only problem I'm facing is that some links are not formatted completely:
/image/1.jpg
or,
//example.com
Which is exactly the way they're getting returned and placed into my array. Is there a way to pull these links in PHP so that the FULL URL is returned? Regarding the examples above;
Instead of /image/1.jpg
, it would be https://example.com/image/1.jpg
.
Instead of //example.com
, it would be https://example.com
.
NOTE: I know that in javascript this can be done using element.href
, but is there anything in PHP, preferably something I can use with regards to my example mentioned above?