0

I two questions regarding loops in R.

I'm using XML package to scrape some tables from the website and combine them using rbind. I'm using following command and it is working without issues if price data and tables are present in the given websites.

url.list <- c("www1", "www2", "www3")

for(url_var in url.list)
{
  url <- url_var
  url.parsed <- htmlParse(getURL(url), asText = TRUE)
  tableNodes <- getNodeSet(url.parsed, '//*[@id="table"]/table')
  newdata <- readHTMLTable(tableNodes[[1]], header=F, stringsAsFactors=F)
  big.data <- rbind(newdata,  big.data)
  Sys.sleep(30)
}

But sometimes web page does not have corresponding table (in this case I'm left with one variable table with the message: No current prices reported.) and my loop stops with following error message (since number of table columns do not match):

 Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match 

I want R to ignore the error and go ahead with the next web page (skipping the one that has different number of columns).

In the end of the loop I have Sys.sleep(30). Does it force R to wait 30 seconds before it tries next web page.

Dave2e
  • 22,192
  • 18
  • 42
  • 50
Behzod A
  • 173
  • 1
  • 2
  • 14
  • 2
    The standard way of dealing with errors in R and allowing the code to continue is `help("tryCatch")`. – Rui Barradas Apr 14 '18 at 17:12
  • Perhaps you just need something like `if(nrow(newdata)>0) big.data <- rbind(...)` - or a similar condition depending on what is actually returned in the case of there being no current prices. – Andrew Gustar Apr 14 '18 at 17:45
  • Agree with Andrew that some sort of pre-qualification step ought to do this but would not that it is the number of columns (and their names) that is at issue when attempting to `rbind` tables or dataframes. – IRTFM Apr 14 '18 at 22:19

1 Answers1

3

As @RuiBarradas Mentioned in the comment, tryCatch is the way we handle errors (or even warnings) in R. Specifically in your case, what you need is going to next iteration when there are errors, So you can do like:

for (url_var in url.list) {
    url <- url_var
    url.parsed <- htmlParse(getURL(url), asText = TRUE)
    tryCatch({
        # Try to run the code within these braces
        tableNodes <- getNodeSet(url.parsed, '//*[@id="table"]/table')
        newdata <- readHTMLTable(tableNodes[[1]], header=F, stringsAsFactors=F)
        big.data <- rbind(newdata,  big.data)
    },
        # If there are errors, go to next iteration
        # Sys.sleep(30) won't be executed in such case
        error = next())
    Sys.sleep(30)
}

And yes, Sys.sleep(30) makes R sleep for 30 seconds when it is executed. Thus, if you want R to always sleep in every iteration no matter the parsing is successful or not, you may consider moving that line in front of tryCatch.

See the well-written answer in How to write trycatch in R for more detailed elaboration of tryCatch.

ytu
  • 1,822
  • 3
  • 19
  • 42