0

I have a problem in coping with error in for loop.

In the code below, I want to scrape data tables and integrate as one dataframe.

During web scraping, some address links does not work, and web scraping stops and ends in the middle of the scraping process. (error location : doc = read_html(i, encoding = 'UTF-8')

How can I proceed next scraping process and complete iteration to the whole vector, ignoring errorneous link?

fdata = data.frame()
n = 1
for (i in data$address) {
  doc = read_html(i, encoding = 'UTF-8')
  dtable = doc %>% 
    html_table()
  fdata = bind_rows(fdata, dtable)
  len = length(data$address)
  print(n/len*100)
  n = n + 1
}
Wookeun Lee
  • 463
  • 1
  • 6
  • 18

2 Answers2

1

Simply adding a try combined with if error next will do, e.g.

fdata = data.frame()
n = 1
for (i in data$address) {
  doc = try(read_html(i, encoding = 'UTF-8'), silent = TRUE)
  if (any(class(doc) == 'try-error')) next
  dtable = doc %>% 
    html_table()
  fdata = bind_rows(fdata, dtable)
  len = length(data$address)
  print(n/len*100)
  n = n + 1
}
niko
  • 5,253
  • 1
  • 12
  • 32
1

You can also use possibly from purrr to return NA on errors, build a function to scrape your table then iterate and bind with map_dfr

library(purrr)
library(rvest)

read_possible <- posibly(read_html, NA)

scrape_table <- function(address) {

  doc <- read_possible(address, encoding = 'UTF-8')

  if (is.na(doc)) {
    NA
  } else  {
    html_table(doc)
  }

}

map_dfr(data$address, scrape_table)
Jake Kaupp
  • 7,892
  • 2
  • 26
  • 36