0

Trying to use tryCatch. What I want is to run through a list of urls that I have stored in page1URLs and if there is a problem with one of them (using readHTMLTable() )I want a record of which ones and then I want the code to go on to the next url without crashing.

I think I don't have the right idea here at all. Can anyone suggest how I can do this?

Here is the beginning of the code:

baddy <- rep(NA,10,000)
badURLs <- function(url) { baddy=c(baddy,url) }

writeURLsToCsvExtrema(38.361042, 35.465144, 141.410522, 139.564819)

writeURLsToCsvExtrema <- function(maxlat, minlat, maxlong, minlong) {

urlsFuku <- page1URLs
allFuku <- data.frame() # need to initialize it with column names

for (url in urlsFuku) {

    tryCatch(temp.tables=readHTMLTable(url), finally=badURLs(url))

    temp.df <- temp.tables[[3]]
    lastrow <- nrow(temp.df)
    temp.df <- temp.df[-c(lastrow-1,lastrow),] 

}
hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
onyourmark
  • 65
  • 1
  • 7

1 Answers1

2

One general approach is to write a function that fully processes one URL, returning either the computed value or NULL to indicate failure

FUN = function(url) {
    tryCatch({
        xx <- readHTMLTable(url)  ## will sometimes fail, invoking 'error' below
        ## more calculations
        xx  ## final value
    }, error=function(err) {
        ## what to do on error? could return conditionMessage(err) or other...
        NULL
    })
}

and then use this, e.g., with a named vector

urls <- c("http://cran.r-project.org", "http://stackoverflow.com", 
          "http://foo.bar")
names(urls) <- urls           # add names to urls, so 'result' elements are named
result <- lapply(urls, FUN)

These guys failed (returned NULL)

> names(result)[sapply(result, is.null)]
[1] "http://foo.bar"

And these are the results for further processing

final <- Filter(Negate(is.null), result)
Martin Morgan
  • 45,935
  • 7
  • 84
  • 112
  • are saying to define FUN like this: FUN = function(url) { tryCatch({ temp.tables=readHTMLTable(url) #do some other stuff }, error=function(err) { badURLS=c(badURLS,url) #this may be unnecessary according to what you suggested. NULL }) and then run it like this: result <- lapply(urlsFuku, FUN) } – onyourmark Apr 02 '14 at 03:06
  • just use `error=function(err) { NULL }`; result from lapply will be a named list, names of list elements with value NULL will be failed urls. – Martin Morgan Apr 02 '14 at 06:10
  • I am not clear about what this is for. names(urls) <- urls. Do I need this? – onyourmark Apr 03 '14 at 05:25
  • compare `lapply(c(1, 2), function(x) x)` with `lapply(c(a=1, b=2), function(x) x)`. In the second case, the argument to lapply has names, and the result has names. So you know which result is associated with which original element. – Martin Morgan Apr 03 '14 at 05:38