1

This is my first question so I apologise in advance if I format/ask it all wrong.

I am using Perl to extract a string from a file, submit a web form, and download a new file created by the web-page. The aim is to have it run for 30,000 files in a loop, which I estimate will take ~8 days. I am using WWW::Selenium and WWW::Mechanize to perform the web automation. The issue I have is that if for some reason a page doesn't load properly or the internet drops for a period of time then the script exits and gives an error message like(depending on which stage it failed at):

Error requesting http://localhost:4444/selenium-server/driver/:
ERROR: Could not find element attribute: link=Download PDB File@href

I would like the script to continue running, moving onto the next round of the loop so I don't have to worry if a single round of the loop throws an error. My research suggests that using Try::Tiny may be the best solution. Currently I have the script below using only try{...} which seems to suppress any error and allow the script to continue through the files. However I'm concerned that this seems to be a very blunt solution and provides me no insight into which/why files failed.

Ideally I would want to print the filename and error message for each occurence to another file that could then be reviewed once the script is complete but I am struggling to understand how to use catch{...} to do this or if that is even the correct solution.

use strict;
use warnings;
use WWW::Selenium;
use WWW::Mechanize;
use Try::Tiny;


my @fastas = <*.fasta>;
foreach my $file (@fastas) {
try{

open(my $fh, "<", $file);
my $sequence;
my $id = substr($file, 0, -6);
while (my $line = <$fh>) {

        ## discard fasta header line
        } elsif($line =~ /^>/) {     # / (turn off wrong coloring)
            next;

        ## keep line, add to sequence string
        } else {
            $sequence .= $line;
        }
    }
close ($fh);

my $sel = WWW::Selenium->new( host => "localhost",
                              port => 4444,
                              browser => "*firefox",
                              browser_url => "http://www.myurl.com",
                            );

$sel->start;
$sel->open("http://www.myurl.com");
$sel->type("chain1", $sequence);
$sel->type("chain2", "EVQLVESGPGLVQPGKSLRLSCVASGFTFSGYGMHWVRQAPGKGLEWIALIIYDESNKYYADSVKGRFTISRDNSKNTLYLQMSSLRAEDTAVFYCAKVKFYDPTAPNDYWGQGTLVTVSS");
$sel->click("css=input.btn.btn-success");
$sel->wait_for_page_to_load("30000");

## Wait through the holding page - will timeout after 5 mins
$sel->wait_for_element_present("link=Download PDB File", "300000");
## Get the filename part of link
$sel->wait_for_page_to_load("30000");
my $pdbName = $sel->get_attribute("link=Download PDB File\@href");
## Concatenate it with the main domain
my $link = "http://www.myurl.com/" . $pdbName;
$sel->stop;

my $mech = WWW::Mechanize->new( autocheck => 1 );
$mech -> get($link);
#print $mech -> content();
$mech -> save_content($id . ".pdb");
};

}
zdim
  • 64,580
  • 5
  • 52
  • 81
PyPingu
  • 1,697
  • 1
  • 8
  • 21
  • 3
    Your code looks broken, for example, the line } elsif($line =~ /^>\/) { . Check your posted code, please. – wolfrevokcats Jun 29 '16 at 18:30
  • Yeah I'm aware of that backslash. Without it the formatting on this site went haywire, I couldn't work out why but it isn't in my actual code. – PyPingu Jun 29 '16 at 21:03
  • 1
    I sympathize with that! I've edited your post to correct the error, and suppress further wrong coloring by the editor. You can do that by adding another slash in a comment on the same line, `# /`. I also added a statement (in that comment) on what it is for. If you don't like this please go ahead and change it, by all means. (But you don't want to post code that's wrong, no matter the format.) – zdim Jun 29 '16 at 21:10
  • Ah thanks! Do you know why that was affecting the coloring? – PyPingu Jun 29 '16 at 21:11
  • Well, it's these slashes (and/or other special symbols), whichever way they are paired (or not). I don't know how the editor parses things, but regex often results in this problem. (I am not certain that it is always the `/` that will turn it off, but it may be some other one.) That itself is finicky, too -- it has to be `# /` and only after that can you add text. – zdim Jun 29 '16 at 21:14
  • Hmm quite curious, I was very confused as it looked absolutely fine in my text editor of choice (sublime) – PyPingu Jun 29 '16 at 22:09
  • Oh, I mean it is the problem with the markdown/editor _here_, when you paste your code in this editor. It highlights things etc, and somewhere there those multiple slashes (when combined with some other characters) in a regex confuse it. – zdim Jun 30 '16 at 06:14

1 Answers1

4

You are completely right that you want to see, log, and review all errors (and warnings). The mechanism and syntax provided by Try::Tiny is meant to be bare-bones and simple to use.

use warnings;
use strict;
use feature qw(say);
use Try::Tiny;

my @fastas = <*.fasta>;

my $errlog = 'error_log.txt';
open my $fh_err, '>', $errlog  or die "Can't open $errlog for writing: $!";

foreach my $file (@fastas) {
    try {
        # processing, potentially throwing a die
    }
    catch {
        say $fh_err "Error with $file: $_";   # NOTE, it is $_ (not $! or $@)
    };
}
close $fh_err;

# Remove the log if empty
if (-z $errlog) { 
    say "No errors logged, removing $errlog";
    unlink $errlog or warn "Can't unlink $errlog: $!";
}    

You can save names of files for which the processing failed, with push @failed_files, $file inside the catch { } block. Then the code can attempt again after the main processing, if you know that errors are mostly due to random connection problems. And having the list of failed files is handy.

Note that with v5.14 the problems that this module addresses were fixed, so that a normal use of eval is fine. It is mostly a matter of preference at this point, but note that Try::Tiny has a few twists of its own. See this post for a discussion.

This addresses the question of the simple exception handling, not the rest of the code.

zdim
  • 64,580
  • 5
  • 52
  • 81