Can't use concurrent ascynrounous URLs with Net::Async::HTTP .. It quits and doesn't goto the next URL

Question

Using the concurrent asynchronous URL example for Net::Async::HTTP, the first encounter of bad URL (timeout, doesn't exist, etc) error causes the program to fail and exit completely, without continuing to the next URL in the array. Is the problem my code or the module?

I tried setting fail_on_error to 0, and even 1, but it had no obvious results.

 #!/bin/perl

 use IO::Async::Loop;
 use Net::Async::HTTP;
 use Future::Utils qw(fmap_void);
 use strict;
 use warnings;
 use feature 'say';

 my $ua_string = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36";
 my $timeout = 10;
 my $max_redirects = 10;
 my $max_in_flight = 10;
 my $max_connections_per_host = 10;
 my $stall_timeout = 10;
 my $max_recurse            = "10";
 my $max_per_host           = "10";

 my @URLs = ( "http://cnn.com", "http://google.com", "http://sdfsdfsdf24.com", "http://msn.net" );
 my $loop   = IO::Async::Loop->new();
 my $http = Net::Async::HTTP->new();

 $loop->add($http);
 my $future = fmap_void {
     my ( $url ) = @_;
     $http->configure(user_agent => $ua_string);
     $http->configure(timeout => $timeout );
     $http->configure(max_redirects => $max_redirects);
     $http->configure(max_in_flight => $max_in_flight);
     $http->configure(max_connections_per_host => $max_connections_per_host);
     $http->configure(stall_timeout => $stall_timeout);
     $http->configure(fail_on_error => '0' );

     $http->GET($url)->on_done(
         sub {
             my $response = shift;
             say "Response: $response->code";
           }
       )->on_fail(
         sub {
             my $fail = shift;
             say "Failed: $fail";
         }
       );
 }
 foreach => \@URLs;
 $loop->await($future);

fail_on_error: Affects the behaviour of response handling when a 4xx or 5xx response code is received. In your example the host 'sdfsdfsdf24.com' does not even return anything as it doesn't exist. So this parameter will not have any affect. — user3606329, Jan 24 '17 at 19:35
Regardless of the fail_on_error boolean setting, when the program reaches sdfsdfsdf24.com it quits and doesn't go onto the next URL which would be msn.com. I'm trying to resolve this so that it continues to continue, on to msn.com and eventually many more URLs. — user7464122, Jan 24 '17 at 19:38
I don't see any solution at the moment to solve this as I don't know the module very well either, but you should try to run a function which does first try resolve all hosts and filter out the bad ones. The fatal error happens because the host cannot be resolved (not because it's not online). Hosts that are simply not online should not make trouble. I made a concurrent call example in this topic http://stackoverflow.com/questions/41252427/cant-fork-more-than-200-processes-sometimes-less-depending-on-memory-cpu-us/41253458#41253458 with another module, maybe it suits you. — user3606329, Jan 24 '17 at 20:26
I tried mojo IOLoop::Delay, but it just did not perform the way I was hoping (asynchronously) . I'm not sure what the solution is. This seemed to have a chance except that it throws itself dead when a host can't be resolved or doesn't respond to a http/s request. I may just go back to threads or better yet hire someone. I'm completely out of ideas on how to fix this. — user7464122, Jan 24 '17 at 20:47
The Mojo is performing the calls concurrent like in your example. The result are then posted into the callback once finished. Regarding your code: I would just try to add a function which resolves the hosts in @urls then onyl continue with resolvable hosts. According to the Net::Async::HTTP documentation failed connections are not fatal, but unresolvable hosts seem to be fatal, Resolving hosts can be done very fast. I don't find any parameter to handle unresolvable hosts errors in the Net::Async:HTTP documentation. — user3606329, Jan 24 '17 at 21:46
I could do that easy. The problem is, is that its failing with a bad proxy set. When the proxy doesnt work, then it can't resolve the address, and thus it fails and quits. I''d easily do that if it solved the problem. It works fine, and wonderfull when the proxy that is being used is working. The program itself is a proxy tester, and the URL being tested is actually always valid when the proxy is working. — user7464122, Jan 24 '17 at 22:51
This might sound naive as I've never used concurrent stuff, but can't you just wrap the whole `$http->GET()...;` call into an `eval` or `try` block to catch actual failures? Of course that assumes it `dies`. — simbabque, Jan 24 '17 at 23:15
I'm using a proxy_host and proxy_port setting in the RT testing, which fails the get $URL almost 99% of the time. I did not include that part because I get the same result with a bad URL (which would be bad when a proxy can't resolve it). — user7464122, Jan 24 '17 at 23:34
A bad URL needs to be handled in a way that isn't fatal, which appears to be the case when it can't resolve it, which can happen with a bad host, such as with sdfsdfsdf24.com, which is in my example, or when a bad proxy is set, such in my RealTime program, with means every $url would be bad, but the first occurrence is fatal and kills the program — user7464122, Jan 24 '17 at 23:48

score 0 · Answer 1 · edited Jun 20 '20 at 09:12

Your example really works well without any proxy, I tested and did some changes:

Fetching URL: '. $url;

        $http->GET($url)->on_done(
            sub {
                my $response = shift;
                say "Response: ".$response->code();
            }
          )->on_fail(
            sub {
                my $fail = shift;
                say "Failed: " . $fail;
            }
          );

Output:

Fetching URL: http://cnn.com
Response: 200
Fetching URL: http://google.com
Response: 302
Fetching URL: http://sdfsdfsdf24.com
Response: 403
Fetching URL: http://msn.net
Response: 200

As this example is not doing async call's the URL's are on a queue and being processed one by one.

Behind the scenes when you are doing a request to a target in your case to some URL's, at the low level the connection is made through a socket connection.

If you have a proxy which is not configured between your script and the intenet, there is no connection and it will raise an exception and your script will die like:

Fetching URL: http://cnn.com
Failed: Timed out

The variable $! is set and the error "Operation now in progress" appears, in fact your request didn't established any connection it just tried to establish one without success.

There are some points which you can check for example:

1 - Is the proxy working ?

2 - Do I have internet connection ?

3 - Is the URL I am testing working ?

If you are having problems with proxy, your script need a small adjust that you can get more info in the docs:

$http->configure( proxy_host => 'xx.xx.xx.xx');
$http->configure( proxy_port => 1234);

Supposing that your proxy is configured, you can check if you have fully access to the internet and aim some target like that URL's.

Trying to access the URLs it will provide you a response code and depending on the code you can do something.

As an alternative solution you could use LWP::UserAgent to make simple requests and check the response code.

use LWP::UserAgent;

 my $ua = LWP::UserAgent->new;
 $ua->timeout(10);
 $ua->env_proxy;

 my $response = $ua->get('http://search.cpan.org/');

 if ($response->is_success) {
     print $response->decoded_content;  # or whatever
 }
 else {
     die $response->status_line;
 }

And even with some bad stats like 4XX for example Net::Async::HTTP won't be friendly to use this module for a simple purpose as it can't handle the exceptions like you want.

Can't use concurrent ascynrounous URLs with Net::Async::HTTP .. It quits and doesn't goto the next URL

1 Answers1