I'm trying to web-scrape a page. However, from time to time my loop doesn`t work, because the parser "fail to load HTTP resource" . The problem is that the page doesn't load in my browser, so it's not a problem with the code.
However, it's quite annoying to have to restart the process after creating an exception to each page that I find an error. I wonder if there is a way to put an if condition. I'm thinking of something like: if an error occurrs, then restart the loop at the next step.
I looke the the help page for htmlParse, and found that there is an error argument, but couldn`t understand how to use it. Any ideas for my if condition?
Below is a reproducible example:
if(require(RCurl) == F) install.packages('RCurl')
if(require(XML) == F) install.packages('XML')
if(require(seqinr) == F) install.packages('seqinr')
for (i in 575:585){
currentPage <- i # define pagina inicial da busca
# Link que ser? procurado
link <- paste("http://www.cnj.jus.br/improbidade_adm/visualizar_condenacao.php?seq_condenacao=",
currentPage,
sep='')
doc <- htmlParse(link, encoding = "UTF-8") #this will preserve characters
tables <- readHTMLTable(doc, stringsAsFactors = FALSE)
if(length(tables) != 0) {
tabela2 <- as.data.frame(tables[10])
tabela2[,1] <- gsub( "\\n", " ", tabela2[,1] )
tabela2[,2] <- gsub( "\\n", " ", tabela2[,2] )
tabela2[,2] <- gsub( "\\t", " ", tabela2[,2] )
listofTabelas[[i]] <- tabela2
tabela1 <- do.call("rbind", listofTabelas)
names(tabela1) <- c("Variaveis", "status")
}
}