I get the error when trying to scrape a news website. I checked, and the website page 32 is broken. I would like to skip the error and keep scraping the rest of the urls.
I have tried the function TryCatch to avoid the broken link, but since I am quite new to R I do not know how to properly write the code. Should I wrap the read_html with that function? If so, how?
url_silla <- 'https://lasillavacia.com/buscar/farc?page=%d'
map_df(0:573, function(i) {
pagina <- read_html(sprintf(url_silla, i, '%s', '%s', '%s', '%s'))
print(i)
data.frame(titles = html_text(html_nodes(pagina,".col-sm-12 h3")),
date = html_text(html_nodes(pagina, ".date.col-sm-3")),
category = html_text(html_nodes(pagina, ".category.col-sm-9")),
tags = html_text(html_nodes(pagina, ".tags.col-sm-12")),
link = paste0("https://www.lasillavacia.com",str_trim(html_attr(html_nodes(pagina, "h3 a"), "href"))),
stringsAsFactors=FALSE)
}) -> noticias_silla
Here is the error. Thanks a lot for any help!
[1] 31
Error in open.connection(x, "rb") : HTTP error 500.
Called from: open.connection(x, "rb")