1

I'm pretty new to Perl so I might be missing something obvious but; I'm debugging a bug and I've narrowed down the problem to the following piece of code.

my $fetch_urls = [];
for my $input_medium ( @{ $input_media } )
{
    $input_medium->{ medium } = MediaWords::DBI::Media::Lookup::find_medium_by_url( $db, $input_medium->{ url } );
    if ( $input_medium->{ medium } )
    {
        $input_medium->{ status } = 'existing';
    }
    else
    {
        if ( MediaWords::Util::URL::is_http_url( $input_medium->{ url } ) )
        {
            push( @{ $fetch_urls }, $input_medium->{ url } );
        }
        else
        {
            WARN "URL is not HTTP(s): " . $input_medium->{ url };
        }
    }
}

the piece of code is supposed to go through all the input_media which are URLs and check if it exists in the system if not it tries to check if the URL is valid using the is_http_url function (at least this is what I think it's doing). What I want to do is add a timeout after which if the URL hasn't responded, I push the message URL was unreachable Any Ideas/suggestions will be highly appreciated.

E_K
  • 2,159
  • 23
  • 39
  • 1
    What is `MediaWords::Util::URL`? I can't find it on internet. If it's some home-cooked module then look over its source to see whether it implements a timeout. Otherwise, you can do it using generic Perl tools – zdim Mar 11 '21 at 03:52
  • Yes, it's home-cooked. You can see it here https://github.com/mediacloud/backend/blob/master/apps/webapp-api/src/perl/MediaWords.pm I haven't been able to navigate through it yet – E_K Mar 11 '21 at 04:03
  • 1
    OK, thank you. But I still don't see the module in which the said `is_http_url` method is for which you need a timeout), which is `MediaWords::Util::URL`. At the link I can only see the `MediaWords.pm`. – zdim Mar 11 '21 at 05:33
  • 1
    Here are a couple of posts that I can readily find (since they're mine), on how to implement a timeout in Perl, [here](https://stackoverflow.com/a/44535265/4653379) and [here](https://stackoverflow.com/a/37817494/4653379). See the linked docs for `alarm`. There are of course many more posts out there. – zdim Mar 11 '21 at 05:45
  • Thank you for this @zdim I think I'll need to first find where `is_http_url` method first and try to use your recommendations – E_K Mar 11 '21 at 06:21
  • 1
    Yes, by all means first find the method and carefully examine it. (It may use other tools that have timers.) The point is, you don't want your timer on top of another -- there can only be one (`alarm` operating at a time :). See the second linked answer above ([this](https://stackoverflow.com/a/37817494/4653379)) for a great example of how things may go. Then, if the library _doesn't_ do that (nor calls a tool that does), add your own. And test like crazy of course, etc – zdim Mar 11 '21 at 06:28
  • 2
    I would use `LWP::UserAgent` and its method `timeout` rather than `alarm`. See for instance thisStackOverflow question: [handle lwp timeout effectively](https://stackoverflow.com/a/10990114/4990392). Still, the right tool to use might depend on how `is_http_url` is implemented. – Dada Mar 11 '21 at 07:07
  • Thanks for your help @zdim @Dada Turns out it was a python module that was being imported using `Inline::Python` so I only needed to implement timeout from there. – E_K Mar 12 '21 at 01:43

0 Answers0