9

So I am using WWW::Mechanize to crawl sites. It works great, except if I request a url such as:

http://www.levi.com/

I am redirected to:

http://us.levi.com/home/index.jsp

And for my script I need to know that this redirect took place and what the url I was redirected to is. Is there anyway to detect this with WWW::Mechanize or LWP and then get the redirected url? Thanks!

Ωmega
  • 42,614
  • 34
  • 134
  • 203
srchulo
  • 5,143
  • 4
  • 43
  • 72

2 Answers2

10
use strict;
use warnings;
use URI;
use WWW::Mechanize;

my $url = 'http://...';
my $mech = WWW::Mechanize->new(autocheck => 0);
$mech->max_redirect(0);
$mech->get($url);

my $status = $mech->status();
if (($status >= 300) && ($status < 400)) {
  my $location = $mech->response()->header('Location');
  if (defined $location) {
    print "Redirected to $location\n";
    $mech->get(URI->new_abs($location, $mech->base()));
  }
}

If the status code is 3XX, then you should check response headers for redirection url.

Ωmega
  • 42,614
  • 34
  • 134
  • 203
  • If I wanted to allow redirects again, or like reset the redirect count, is there a way I could do that? Or for instance, could I follow a string of redirects to their final location and still know that the status was between 300 and 400? I got rid of max_redirect(0), but then I just got a status of 500 and I know that's not right... – srchulo Jun 07 '12 at 02:12
  • if anyone looks at this for reference, simply storing a new WWW::Mechanize object in $mech does the trick. – srchulo Jun 07 '12 at 04:03
1

You can also get to the same place by inspecting the redirects() method on the response object.

use strict;
use warnings;
use feature qw( say );

use WWW::Mechanize;

my $ua = WWW::Mechanize->new;
my $res = $ua->get('http://metacpan.org');

my @redirects = $res->redirects;
say 'request uri: ' . $redirects[-1]->request->uri;
say 'location header: ' . $redirects[-1]->header('Location');

Prints:

request uri: http://metacpan.org
location header: https://metacpan.org/

See https://metacpan.org/pod/HTTP::Response#$r-%3Eredirects Keep in mind that more than one redirect may have taken you to your current location. So you may want to inspect every response which is returned via redirects().

oalders
  • 5,239
  • 2
  • 23
  • 34