0

I'm parsing through an HTML page and scraping the links:

function get_links($url) { 

    // Create a new DOM Document to hold our webpage structure 
    $xml = new DOMDocument(); 

    // Load the url's contents into the DOM 
    $xml->loadHTMLFile($url); 

    // Empty array to hold all links to return 
    $links = array(); 

    //Loop through each <a> tag in the dom and add it to the link array 
    foreach($xml->getElementsByTagName('a') as $link) { 
        $links[] = array('url' => $link->getAttribute('href')); 
    } 

    //Return the links 
    return $links; 
} 

$arrayLinks = get_links($url);

The only problem I'm facing is that some links are not formatted completely:

/image/1.jpg

or,

//example.com

Which is exactly the way they're getting returned and placed into my array. Is there a way to pull these links in PHP so that the FULL URL is returned? Regarding the examples above;

Instead of /image/1.jpg, it would be https://example.com/image/1.jpg.

Instead of //example.com, it would be https://example.com.

NOTE: I know that in javascript this can be done using element.href, but is there anything in PHP, preferably something I can use with regards to my example mentioned above?

LatentDenis
  • 2,839
  • 12
  • 48
  • 99

0 Answers0