7

I am using LWP to download content from web pages, and I would like to limit the amount of time it waits for a page. This is accomplished in lwp like this:

my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->get($url);

And this works fine, except for whenever the timeout reaches its limit, it just dies and I can't continue on with the script! I'd really like to handle this timeout properly so that I can record that the url had a timeout and then move on to my next one. Does anyone know how to do this? Thanks!

srchulo
  • 5,143
  • 4
  • 43
  • 72

3 Answers3

17

LWP::Agent's get() returns a HTTP::Response object that you can use for checking errors:

use LWP::Agent;
use HTTP::Status ();

my $ua = LWP::UserAgent->new;
$ua->timeout(10);
my $response = $ua->get($url);

if ($response->is_error) {
    printf "[%d] %s\n", $response->code, $response->message;

    # record the timeout
    if ($response->code == HTTP::Status::HTTP_REQUEST_TIMEOUT) {
        ...
    }
}

Btw, the better practice nowadays is to use Try::Tiny instead of eval {...}. It gives you try {...} catch {...}. and it resolves some problems with checking if $@ (see the background section in the Try::Tiny documentation).

stevenl
  • 6,736
  • 26
  • 33
  • Thanks a lot! This is really useful. I tested it though and for some reason even when there's a timeout it doesn't get inside that second if statement. "read timeout" is what is inside of $response->message. Do you know why it's not testing as true for the second if statement? – srchulo Jun 12 '12 at 04:50
  • Don't know for sure. Did you `use HTTP::Status`? What are the actual values of `$response->message` and `$response->code`? Is it an actual timeout (code 408)? – stevenl Jun 12 '12 at 05:10
  • I used the code exactly as it is above "HTTP::Status::HTTP_REQUEST_TIMEOUT". $response->message holds "read timeout" and $response->code holds "500". – srchulo Jun 12 '12 at 05:18
  • HTTP_REQUEST_TIMEOUT represents error code 408. Code 500 is a server error, so you might not want to just check for the timeout error. See [HTTP::Status](https://metacpan.org/module/HTTP::Status) for the full list of error codes. – stevenl Jun 12 '12 at 05:23
  • when the agent times out (as opposed to the server returning a timeout status) or has any other problem where there was no response from the server, it sets the response code to 500, and you can check the message to see the reason – ysth May 13 '21 at 17:50
  • The response code for the timeout from the user agent is now 500 instead of HTTP::Status::HTTP_REQUEST_TIMEOUT (408). See documentation at https://metacpan.org/pod/LWP::UserAgent#timeout. So to determine if the timeout occurred you have to check if the `Client-Warning` header is `"Internal response"` and check the message is `"read timeout"`. – bdrx Jun 24 '21 at 20:27
2

For most purposes, LWP::UserAgent's timeout is sufficient, but it does suffer some drawbacks… it applies to each system call, rather than to the aggregate of them. If you truly need a fixed timeout period, this is one of the things that LWPx::ParanoidAgent takes care off.

daxim
  • 39,270
  • 4
  • 65
  • 132
ysth
  • 96,171
  • 6
  • 121
  • 214
1

You can do the equivalent of a try{} catch {} in Perl using eval blocks:

http://perldoc.perl.org/functions/eval.html

Soz
  • 957
  • 1
  • 5
  • 9