2

I am writing a perl script to extract certain data using curl commands. EG:

my $raw_json = `curl -X GET <some website url> -H <some parameters>`;

The issue is sometimes this website crashes and my code gets stuck at the same place for a long time. I want the code to skip this line and go to the next line if the request is taking more than a specified time, say 30 seconds.

I tried using $SIG{ALRM} in my script as follows:

my $timeout = 30;
eval {
   local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required
   alarm $timeout;
   my $raw_json = `curl -X GET <some website url> -H <some parameters>`;
   alarm 0;
};
if ($@) {
 print "\nERROR!\n";
   die;   # propagate unexpected errors
      # timed out
} 
else {
   # didn't
}

I expected the run to stop after 30 seconds, but what is happening is I do get the "ERROR" statement printed after 30 seconds but get request keeps on running even after that.

TLP
  • 66,756
  • 10
  • 92
  • 149
  • 3
    Very related: https://unix.stackexchange.com/questions/94604/does-curl-have-a-timeout TL;DR Use the curl options `--connect-timeout` and `--max-time`. – Ted Lyngmo May 09 '23 at 10:30
  • Whenever you are using a system call to do something, know that it most often is a shortcut. Sometimes you run into issues with your shortcut, and you start problem solving and realize it is no longer a shortcut. That is what is happening here. You should look for a perl library to do the job for you. I googled and found [`WWW::Curl`](https://metacpan.org/pod/WWW::Curl), for example, but I don't know if it is a good one. – TLP May 09 '23 at 10:42

3 Answers3

3

The curl is happening in a subprocess, so you need to stop that subprocess. Perl isn't going to stop that for you.

Use the --connect-timeout or --max-time to curl so you don't need the alarm and curl cleans itself up.

As @ikegami suggested, the next simplest thing is IPC::Run, which can handle the details of a timeout for an external process.

Or, if you want to handle the alarm yourself, you need to work at a lower level so you have the PID of the subprocess and can kill it yourself. See perlipc.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
  • 1
    To avoid going to a lower lever, you could switch to IPC::Run which supports timeouts. The first idea sounds best, though. – ikegami May 09 '23 at 13:33
3

Best to set curl's own timer, and can do that using Perl's libcurl wrapper, Net::Curl::Easy

use warnings;
use strict;
use feature 'say';

use Net::Curl::Easy qw(:constants); 

my $curl = Net::Curl::Easy->new; 
$curl->setopt(CURLOPT_URL, "www.example.com" );  
$curl->setopt(CURLOPT_TIMEOUT, 30); 

$curl->perform;

See constants in curlopt_easy_setopt, or the straight-up list in easy_setopt_options. Here I use CURLOPT_TIMEOUT, for the whole operation, while there is CURLOPT_CONNECTTIMEOUT to consider as well. There are yet other timeouts.

This module uses C-style interface but then again this will be familiar to curl's use.


A more realistic use, with a returned document stored in a variable and perform checked for errors (most methods throw on errors)

my $curl = Net::Curl::Easy->new; 
$curl->setopt(CURLOPT_URL, "www.example.com" );  
$curl->setopt(CURLOPT_WRITEDATA, \my $response);  # or declare earlier
$curl->setopt(CURLOPT_TIMEOUT, 30); 

eval { $curl->perform };
if ($@ and ref $@ eq "Net::Curl::Easy::Code" ) {
    die "curl eval-ed: $@";
}
elsif ($@) { die $@ }  # probably not curl error, re-raise

say $response; 

In newer Perls we have nicer exception handling ways, of try-catch style. See for example this post for an example and links.

zdim
  • 64,580
  • 5
  • 52
  • 81
1

Other approach may be with LWP::UserAgent module, so you have a higher control on what is happening with your request, define timeouts, and all you need to send a request and analyze the response.

HTTP::Tiny is an alternative designed for doing simple requests. In both cases, module installation is required, but it's an easy task.

Miguel Prz
  • 13,718
  • 29
  • 42