8

I know that using cURL i can see the destination URL, pointing cURL to URL having CURLOPT_FOLLOWLOCATION = true.

Example :

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "www.example1.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); 
$result = curl_exec($ch);
$info = curl_getinfo($ch); //Some information on the fetch
curl_close($ch);

$info will have the url of the final destination which can be www.example2.com. I hope my above understanding is correct. Please let me know if not!.

My main question is, what all type of redirection cURL will be able to know? Apache redirect, javascript redirects, form submition redirects, meta-refresh redirects!?

update Thanks for your answeres @ceejayoz and @Josso. So is there a way by which I can follow all the redirect programatically through php?

ajreal
  • 46,720
  • 11
  • 89
  • 119
jtanmay
  • 2,607
  • 6
  • 26
  • 36
  • Do you including all javascript and meta refresh? meta refresh is possible – ajreal Dec 15 '10 at 21:05
  • @ajreal: Javascript is also [possible](http://pecl.php.net/package/spidermonkey), just not with justifiable effort. – mario Dec 15 '10 at 21:10

5 Answers5

9

cURL will not follow JS or meta tag redirects.

ceejayoz
  • 176,543
  • 40
  • 303
  • 368
  • So is there a way by which I can follow all the redirect programatically through php – jtanmay Dec 15 '10 at 20:43
  • Probably not in a reliable manner. You could parse out a meta refresh fairly easily, but there are so many ways to do it with JS - including calls to external .js files - that you'd probably never catch them reliably. – ceejayoz Dec 15 '10 at 20:45
3

I know this answer is a little late, but I ran into a similar issue and needed more than just following the HTTP 301/302 status redirects. So I wrote a small library that will also follow rel=canonical and og:url meta tags.

https://github.com/mattwright/URLResolver.php

I found meta refresh tags to not provide much benefit, but they are used if no head or body html tag is returned.

Matt
  • 910
  • 7
  • 21
1

I just found this on the php site. It parses the response to find redirects and follows them. I don't think it gets every type of redirect, but it's pretty close

http://www.php.net/manual/en/ref.curl.php#93163

I'd copy it here but I don't want to plagiarize

SeanDowney
  • 17,368
  • 20
  • 81
  • 90
1

As far as I know, it only follows HTTP Header redirects. (301 and 302).

johankj
  • 1,765
  • 16
  • 34
1

curl is a multi-protocol library, which provides just a little HTTP support but not much more that will help in your case. You could manually scan for the meta refresh tag as workaround.

But a better idea was to check out PEAR HTTP_Request or the Zend_Http class, which more likely already provide something like this. Also phpQuery might be relevant, as it comes with its own http functions, but could easily ->find("meta[refresh]") if there's a need. Or look for a Mechanize-like browser class: Is there a PHP equivalent of Perl's WWW::Mechanize?

Community
  • 1
  • 1
mario
  • 144,265
  • 20
  • 237
  • 291