4

Using Perl, I'm looking for a simple way to perform a handful of HTTP requests in parallel, where I get responses back in the same order I sent them after they complete, e.g.:

my ($google, $perl) = foobar(GET => 'http://www.google.com/',
                             GET => 'http://www.perl.org/');

Is there a module I should be looking at?

I know I can do the bookkeeping by hand, but I feel spoiled after being able to do this using jQuery's when method, and I'd love to have as simple a solution using Perl.

Thanks for your help.

ikegami
  • 367,544
  • 15
  • 269
  • 518
Anirvan
  • 6,214
  • 5
  • 39
  • 53
  • FWIW: I just tried to get LWP::Parallel working for the last two hours and have been pulling my hair out. I do *NOT* recommend LWP::Parallel at all. It seems very buggy and the API as well as the documentation are lacking. – jlh Dec 29 '13 at 20:08

2 Answers2

13
use threads;
use LWP::UserAgent qw( );

my $ua = LWP::UserAgent->new();
my @threads;
for my $url ('http://www.google.com/', 'http://www.perl.org/') {
   push @threads, async { $ua->get($url) };
}

for my $thread (@threads) {
   my $response = $thread->join;
   ...
}

The best part is that the parent doesn't wait for all requests to be completed. As soon as the right request is completed, the parent will unblock to process it.


If you used Parallel::ForkManager or something else where you can't wait for a specific child, you can use the following code to order the results:

for my $id (0..$#urls) {
   create_task($id, $urls[$id]);
}

my %responses;
for my $id (0..$#urls) {
   if (!exists($responses{$id})) {
      my ($id, $response) = wait_for_a_child_to_complete();
      $responses{$id} = $response;
      redo;
   }

   my $response = delete($responses{$id});
   ...
}
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • how about a Parallel::ForkManager example too? – ysth Dec 04 '11 at 06:24
  • 1
    @ysth, The point of P::FM is to work with a fixed-size pool of workers. That's good for sharing CPU. In this case, the OP wants as many workers as possible as all will be sleeping waiting for IO. (If you want processes instead of threads, change `use threads;` to `use forks;`.) – ikegami Dec 04 '11 at 07:30
  • @ysth, Added an ordering algorithm you could use with P::FM. – ikegami Dec 04 '11 at 09:39
  • 1
    Is there any option to limit threads ( to 200 ) if there is more than 1000 $urls? – ovntatar Oct 31 '13 at 13:56
  • @ovntatar, Yes, but you should consider using Net::Curl::Multi. – ikegami Jul 16 '14 at 19:18
11

I am a fan of Mojo! From the Mojo::UserAgent documentation:

use Mojo;
use Mojo::UserAgent;
# Parallel requests
my $ua = Mojo::UserAgent->new;
$ua->max_redirects(5);
my $delay = Mojo::IOLoop->delay;
for my $url ('http://www.google.com/', 'http://www.perl.org/') {
  $delay->begin;
  $ua->get($url => sub {
    my ($ua, $tx) = @_;
    $delay->end($tx->res->dom);
  });
}
my @responses = $delay->wait;
print join "\n", @responses

Enjoy!

EDIT

Btw. you do not have to process the responses at the end, you may do it in between:

# ...
$ua->get($url => sub {
    my ($ua, $tx) = @_;
    $delay->end(1);
    # process $tx->res here
});
# ...
$delay->wait;
esskar
  • 10,638
  • 3
  • 36
  • 57
  • 1
    I have an entry just like this for the [Perl Advent Calendar](http://perladvent.org) this year, although I don't know which day it will be on. – brian d foy Dec 04 '11 at 08:12
  • @brian d foy, In esskar's solution, the parent waits much longer than needed before starting to process responses. Can that be fixed simply? – ikegami Dec 05 '11 at 06:32