I'm trying to scrape some websites using CURL. In order to change the relative URL's I have inserted this:
$curl_scraped_page = preg_replace("/<head>/i", "<head><base href='$url' />", $curl_scraped_page, 1);
It's working good for most websites but not all of them. For instance this website "NS Website" show's no effect at all, meaning the URL's are completed with my domain as base url: mydomain.com/css.css
This is the complete code Im using:
<?php
$url = $_GET['url'];
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,2);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
$curl_scraped_page = preg_replace("/<head>/i", "<head><base href='$url' />", $curl_scraped_page, 1);
curl_close($ch);
echo $curl_scraped_page;
?>