-1

If i download a page using file_get_contents() and the result it's like this

<head>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta content="no-cache">
<link href="/style.css?q=245f90373d16694462b99d01a6a3eac8" rel="stylesheet" type="text/css">
<script type='text/javascript' src='/page?lang=en'></script>
</head>
<body>
<a href="/page?q=245f90373d16694462b99d01a6a3eac8"></a>
<a href="./page?q=245f90373d16694462b99d01a6a3eac8"></a>
<a href="http://example.com/page?q=245f90373d16694462b99d01a6a3eac8"></a>
<a href="https://example.com/page?q=245f90373d16694462b99d01a6a3eac8"></a>
</body>
[...]
</html>

I need a function that converts every href and src to http://example2.org/*

Note that there are quotes " but also '

A not difficult way to do it like preg_replace()?

1 Answers1

0

Solved using Dom Parser it works and good.

It does not take the single quote ' in

<script type='text/javascript' src='/page?lang=en'></script>

because it's an error, but the site it's not mine, so the answer it's

$html = str_get_html($yourpagecontent);
foreach($html->find('a[href]') as $element) {
        $tryer=$element->href;
        if ( substr($tryer, 0, strlen('/')) === '/') {
            $element->href = parse_url($url, PHP_URL_SCHEME)."://".parse_url($url, PHP_URL_HOST).$tryer;
        }

        if ( substr($tryer, 0, strlen('./')) === './') {
            $element->href = dirname($url).substr($tryer, 1);
        }
}

foreach($html->find('link[href]') as $element) {
        $tryer=$element->href;
        if ( substr($tryer, 0, strlen('/')) === '/') {
            $element->href = parse_url($url, PHP_URL_SCHEME)."://".parse_url($url, PHP_URL_HOST).$tryer;
        }
        
        if ( substr($tryer, 0, strlen('./')) === './') {
            $element->href = dirname($url).substr($tryer, 1);
        }  
}  

foreach($html->find('script[src]') as $element) {
        $tryer=$element->href;
        if ( substr($tryer, 0, strlen('/')) === '/') {
            $element->href = parse_url($url, PHP_URL_SCHEME)."://".parse_url($url, PHP_URL_HOST).$tryer;
        }
        
        if ( substr($tryer, 0, strlen('./')) === './') {
            $element->href = dirname($url).substr($tryer, 1);
        }  
}