I'm trying to extract data by text mining with html_nodes
using urls that I have saved into an object called url
. I have created a loop which reads and scrapes each url.
library(rvest)
for (i in url) {
tex <- read_html(i)
p_text <- tex %>%
html_nodes("p") %>%
html_text()
a <- p_text
}
Because some url isn't working, the following message appears:
Error in open.connection(x, "rb") : Could not resolve host: app.lo
I want to introduce in the loop the following: if the url doesn't work, assume the text blank, and let the loop continue. This is a really a problem because the loop is stopping and I was trying to eliminate some urls, but I have around 200,000 htmls.