0

If I have a URL (eg. http://www.foo.com/alink.pl?page=2), I want to determine if I am being redirected to another link. I'd also like to know the final URL (eg. http://www.foo.com/other_link.pl).

I want to know how to do that in PHP

Thank you all for your help :)

(more information:

I want to have a function that is called doesItDirect($url) which returns the url which it redirects to if true, and it returns the same url passed if false

)

user220755
  • 4,358
  • 16
  • 51
  • 68
  • 1
    Can you provide context? Are you using `curl` or `file_get_contents` or is "foo.com" your site with your PHP code... in that case, why don't you know what is redirecting? – Doug Neiner Jan 07 '10 at 06:45
  • no what i want to use it that someone provides me with a website url for example. I want to have a function that is called doesItDirect($url) which returns the url which it redirects to if true, and it returns the same url passed if false – user220755 Jan 07 '10 at 06:57

3 Answers3

2

If you're using cURL, you can do a curl_getinfo ($ch, CURLINFO_EFFECTIVE_URL) as documented here: http://sg.php.net/manual/en/function.curl-getinfo.php

Example:

<?php
    $ch = curl_init ('http://www.foo.com/alink.pl?page=2');
    curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);

    curl_exec ($ch);

    if (!curl_errno ($ch))
        $url = curl_getinfo ($ch, CURLINFO_EFFECTIVE_URL);

    curl_close ($ch);

    echo $url;
?>
K Prime
  • 5,809
  • 1
  • 25
  • 19
  • it is not working, i tried to implement it , and it breaks the code, do you have any idea why? – user220755 Jan 07 '10 at 07:23
  • not if you just write "it breaks the code". you need to give more details if you want people to be able to help you. do you have cURL installed? – mpen Jan 07 '10 at 07:29
  • the code didn't give me any details, when i use the code you just provided, it just gives me a blank page (the code does not run) – user220755 Jan 07 '10 at 07:33
  • The code had typos - I've just fixed them, and added an `echo` to see the final URL – K Prime Jan 07 '10 at 07:41
  • It catches bit.ly urls for examples, but it does not catch this: – user220755 Jan 07 '10 at 07:58
  • Yea, it only works with HTTP headers, not the `http-equiv` meta tags - you might have to parse the HTML to get that – K Prime Jan 07 '10 at 08:29
1

You'll need to do a http-request to the said url and check the response headers you get. A 301 or 302 response means it's a redirect. The redirection url is included in the response headers and will look like Location: <url>.

Update: the manual provided a useful example, from which I put together this, which seems to work:

<?php  
function isRedirectUrl($url) {
    $redirectCodes = array(301, 302, 303, 307);

    if ($fp = fopen($url, 'r')) {
        $meta = stream_get_meta_data($fp);

        list($http_version, $code, $message) = explode(' ', $meta['wrapper_data'][0], 3);

        if (in_array(intval($code), $redirectCodes)) {
            foreach ($meta['wrapper_data'] as $header) {
                list($name, $value) = explode(':', $header, 2);

                if ($name == 'Location') {
                    return trim($value);
                }
            }    
        }

        fclose($fp);
    }

    return false;
}

function getCanonicalUrl($url) {
    $ret = $url;
    while ($test = isRedirectUrl($ret)) {
        if ($test) {
            $ret = $test;
        }
    }

    return $ret;
}

var_dump(getCanonicalUrl('http://<url to test>'));
?>
nikc.org
  • 16,462
  • 6
  • 50
  • 83
  • is it possible to provide me with an outline of how would that work so i can start doing that? thank you for the help! – user220755 Jan 07 '10 at 07:28
  • If you have the [HTTP](http://www.php.net/manual/en/book.http.php) extension in your PHP installation, you can use [`HttpRequest::getResponseCode `](http://www.php.net/manual/en/function.httprequest-getresponsecode.php) and [`HttpRequest::getResponseHeader `](http://www.php.net/manual/en/function.httprequest-getresponseheader.php) – nikc.org Jan 07 '10 at 07:33
  • It catches bit.ly urls for examples, but it does not catch this: – user220755 Jan 07 '10 at 07:59
  • That's true, catching meta-redirects would require parsing the HTML document and this mechanism only looks at the HTTP response. – nikc.org Jan 07 '10 at 08:01
  • is there anyway else to redirect, i want to make sure that if the page redirects, i know, is there a way to make sure i know? (sorry for too much questions but i need to know if the page redirects in anyway) – user220755 Jan 07 '10 at 08:13
  • You would need to check and execute all javascript as well to be completely sure what happens after the page is rendered. – nikc.org Jan 07 '10 at 08:17
0

It's not easy.

It's not impossible, but it's pretty darn hard. These are the ways you can do a redirection:

Header Redirection.

This is where you ask for "gimmiemypage.php" and instead of sending "200 - OK" as the status, it sends a "30? - Redirected" header (Where ? is 1 or 2). This is really easy to detect, because curl will tell you. Hurrah.

HTML Refresh Redirection.

This is where you use a and one second after parsing that, the browser forwards you onwards.

This is harder to detect because you have to specifically look for meta headers, so you'll need to parse arbitary HTML (Do Not Use Regexes for this, That Would Be Bad) to find those tags. They should always be in , but those wacky karazee webdevelopers might hide them.

Then there are Javascript redirects. Finding these without evaluating the javascript to see what happens is almost impossible. There are various different ways you can redirect people in JS, but you could catch those with a parser. However, because this is JS, you'll end up needing to read and evaluate all the JS you can see on the page, and the included JS, and anything that includes...

My advice is to try and find a way that doesn't mean you need to know about all redirects, because it's a very deep well to fall into.

Community
  • 1
  • 1
Aquarion
  • 591
  • 3
  • 17