45

I'm using file_get_contents() to grab content from a site, and amazingly it works even if the URL I pass as argument redirects to another URL.

The problem is I need to know the new URL, is there a way to do that?

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
HappyDeveloper
  • 12,480
  • 22
  • 82
  • 117

5 Answers5

72

If you need to use file_get_contents() instead of curl, don't follow redirects automatically:

$context = stream_context_create(
    array(
        'http' => array(
            'follow_location' => false
        )
    )
);

$html = file_get_contents('http://www.example.com/', false, $context);

var_dump($http_response_header);

Answer inspired by: How do I ignore a moved-header with file_get_contents in PHP?

Community
  • 1
  • 1
Jakub Zalas
  • 35,761
  • 9
  • 93
  • 125
  • 7
    where did you get `$http_response_header`? – Petr Peller Apr 12 '13 at 11:03
  • 10
    @PetrPeller This is a PHP special variable: http://php.net/manual/en/reserved.variables.httpresponseheader.php – Jakub Zalas Apr 13 '13 at 01:58
  • 3
    I tried this, and while it does stop the redirect as per the question linked at the end of this answer, it does not provide the "real URL" as requested in this question. It could also be that the server I'm trying this with doesn't support it, though. It appears to me though the curl() method is the only reliable way. – Rob Porter Nov 14 '14 at 14:56
  • 3
    @RPorter You need to extract the 301 Location inside of `$http_response_header`. – mgutt Mar 27 '15 at 01:00
20

You might make a request with cURL instead of file_get_contents().

Something like this should work...

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$a = curl_exec($ch);
if(preg_match('#Location: (.*)#', $a, $r))
 $l = trim($r[1]);

Source

Jakub Zalas
  • 35,761
  • 9
  • 93
  • 125
alex
  • 479,566
  • 201
  • 878
  • 984
  • 1
    Thanks.. But where do I get the html output then? Inside $a I can only see the headers. Is it possible to get everythin with only one request? edit: okay that was stupid. Now I get it, there are going to be 2 requests anyway =D. Thanks! – HappyDeveloper Dec 01 '10 at 12:09
  • 1
    CURL is not available in Google App engine so the answer does not help if you need to use file_get_contents() – c.. Dec 23 '14 at 21:26
  • 1
    @ixlli Right. I must have missed where the OP says *answers must apply only to the Google App Engine environment*. – alex Dec 24 '14 at 05:24
  • 3
    @alex heh... i think the point is that he's asking about file_get_contents() so when google searching for the problem this is what you find. – c.. Dec 24 '14 at 14:33
18

Everything in one function:

function get_web_page( $url ) {
    $res = array();
    $options = array( 
        CURLOPT_RETURNTRANSFER => true,     // return web page 
        CURLOPT_HEADER         => false,    // do not return headers 
        CURLOPT_FOLLOWLOCATION => true,     // follow redirects 
        CURLOPT_USERAGENT      => "spider", // who am i 
        CURLOPT_AUTOREFERER    => true,     // set referer on redirect 
        CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect 
        CURLOPT_TIMEOUT        => 120,      // timeout on response 
        CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects 
    ); 
    $ch      = curl_init( $url ); 
    curl_setopt_array( $ch, $options ); 
    $content = curl_exec( $ch ); 
    $err     = curl_errno( $ch ); 
    $errmsg  = curl_error( $ch ); 
    $header  = curl_getinfo( $ch ); 
    curl_close( $ch ); 

    $res['content'] = $content;     
    $res['url'] = $header['url'];
    return $res; 
}  
print_r(get_web_page("http://www.example.com/redirectfrom")); 
Renaud
  • 16,073
  • 6
  • 81
  • 79
  • Be careful. CURLOPT_FOLLOWLOCATION is not allowed when "open_base_dir" is filled or when safe_mode is enabled. – czjvic Feb 27 '14 at 13:31
6

A complete solution using the bare file_get_contents (note the in-out $url parameter):

function get_url_contents_and_final_url(&$url)
{
    do
    {
        $context = stream_context_create(
            array(
                "http" => array(
                    "follow_location" => false,
                ),
            )
        );

        $result = file_get_contents($url, false, $context);

        $pattern = "/^Location:\s*(.*)$/i";
        $location_headers = preg_grep($pattern, $http_response_header);

        if (!empty($location_headers) &&
            preg_match($pattern, array_values($location_headers)[0], $matches))
        {
            $url = $matches[1];
            $repeat = true;
        }
        else
        {
            $repeat = false;
        }
    }
    while ($repeat);

    return $result;
}

Note that this works only with an absolute URL in the Location header. If you need to support relative URLs, see PHP: How to resolve a relative url.

For example, if you use the solution from the answer by @Joyce Babu, replace:

            $url = $matches[1];

with:

            $url = getAbsoluteURL($matches[1], $url);
Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
1

I use get_headers($url, 1);

In my case redirect url in get_headers($url, 1)['Location'][1];

kostikovmu
  • 411
  • 5
  • 6