1

I'm using RSelenium and lapply() to scrape a fairly complex set of pages. I occasionally run into problems with the page not loading as expected and thus the code failing.

It happens so rarely that rather than try and deal with every possible error, I would just like to skip the current iteration and go on to the next. It looks like tryCatch() is what I'm looking for, but I'm not sure where to put it in the code.

I know this is not complete but I hope this is enough to go on as it looks like tryCatch() is wrapped around lapply() or with the return() statement. Thanks in advance.

team_id <- c(1:10)
df_list <- lapply (1:length(team_id), function(x) {
        # complex navigation and scraping of multiple sub tables
        # to create a final teamtable
        <code>
        return(teamtable)
})
df <- data.table::rbindlist(df_list)

seansteele
  • 619
  • 3
  • 10

1 Answers1

1

Let's take an example to take square root of a number.

x <- list(1, 3, 4, 'a', 5)

do.call(rbind, lapply(x, function(p) {
       sqrt(p)
}))

Error in sqrt(p) : non-numeric argument to mathematical function

To avoid the error you can use tryCatch in the following way.

do.call(rbind, lapply(x, function(p) {
    tryCatch(sqrt(p), error = function(e) return(NULL))
}))

#         [,1]
#[1,] 1.000000
#[2,] 1.732051
#[3,] 2.000000
#[4,] 2.236068

Depending on how you want your final output you can decide whether you want to send NULL in final output or NA. When you rbind NULL values they are ignored whereas NA's would still remain in the data indicating that input is not what you expected and there was some error.


For your case you can do :

df_list <- do.call(rbind, lapply(x, function(p) {
               tryCatch({
                <code>
                 return(teamtable)
            }, error = function(e) return(NULL))
           })) 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • So based on your example I would want to add `tryCatch( return(teamtable), error = function(e) return(NULL))`? I'm gettig a _expected ',' after expression_ warning throughout my . – seansteele Jul 31 '20 at 01:14
  • 1
    Try to wrap your `` in `{}`. See update in the answer. – Ronak Shah Jul 31 '20 at 01:27
  • 1
    why not return NA instead on NULL? NULL removes the element, distorting the order, etc – Onyambu Jul 31 '20 at 01:30
  • Yes, as I have included in the answer depending OP's need we can select if we want to return `NA` or `NULL`. As OP is doing scraping sometimes all you are interested is final combined table. – Ronak Shah Jul 31 '20 at 01:40
  • Thanks @RonakShah that seemed to do the trick. @Onyambu, yes I just wanted to skip it because at that level it didn't have the underlying tables to grab data from, but your suggestion of `NA` was helpfully in another part of the script so thank both of you. – seansteele Jul 31 '20 at 20:26