0

When I am trying to run below code I get an error as "Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : cannot coerce class ""try-error"" to a data.frame "

I am using try function to skip through the LINKS which are not working and continue with the loop but that's not happening. Can someone help me with this

base_url <- c("https://www.sec.gov/Archives/edgar/data/1409916/000162828017002570/exhibit211nobilishealthcor.htm",
              "https://www.sec.gov/Archives/edgar/data/1300317/000119312507128181/dex211.htm",
              "https://www.sec.gov/Archives/edgar/data/1453814/000145381417000063/subsidiariesoftheregistran.htm",
              "https://www.sec.gov/Archives/edgar/data/25743/000138713117001111/ex21-1.htm",
              "https://www.sec.gov/Archives/edgar/data/880631/000119312517065534/d280058dex211.htm",
              "https://www.sec.gov/Archives/edgar/data/1058290/000105829017000008/ctshexhibit21112312016.htm",
              "https://www.sec.gov/Archives/edgar/data/1031927/000141588916005383/ex21-1.htm",
              "https://www.sec.gov/Archives/edgar/data/1358071/000135807118000008/cxoexhibit211.htm",
              "https://www.sec.gov/Archives/edgar/data/904979/000090497918000006/exhibit211q4fy17listofsubs.htm",
              "https://www.sec.gov/Archives/edgar/data/41296/000094420901500099/dex21.txt",
              "https://www.sec.gov/Archives/edgar/data/808461/000080846117000024/gciexhibit21-1123116.htm",
              "https://www.sec.gov/Archives/edgar/data/1101026/000107878213000519/f10k123112_ex21.htm",
              "https://www.sec.gov/Archives/edgar/data/932372/000141588915000759/ex21-1.htm"
              )

  df <- lapply(base_url,function(u){
  try({

  html_obj <- read_html(u)
  draft_table <- html_nodes(html_obj,'table')
  cik <- substr(u,start = 41,stop = 47)
  draft1 <- html_table(draft_table,fill = TRUE)
  final <- c(cik,draft1)
  },silent = TRUE)
})


require(reshape2)
data <- melt(df)
data <- as.data.frame(data,row.names = NULL)
data <- data[,1:2]
names(data) <- c("CIK","Company")

data2 <- transform(data, CIK = na.locf(CIK ))

3 Answers3

1

You could use purrr's safely function. It creates for each url a list containing the results from the below function and the error message if such exists without exiting the loop.

library(tidyverse)

checklinks <- function(url) {
  cik <- url %>% 
    str_extract("[:digit:]+")
  table <- read_html(url) %>% 
    html_nodes("table") %>%
    html_table() %>% 
    bind_rows() %>% 
    na_if("") %>% 
    filter(rowMeans(is.na(.)) < 1) %>% 
    mutate(cik = cik) %>% 
    select(cik, everything())
  return(table)
}

final <- base_url %>% 
  map(safely(checklinks)) %>% 
  transpose() %>% 
  .$result %>% 
  bind_rows() 
TTR
  • 129
  • 5
  • After getting the output in 'links_t', how should i get the ouput into a dataframe ? because if i use the below code it throws error: require(reshape2) data <- melt(df) data <- as.data.frame(links_t,row.names = NULL) – Gautam Biswas Mar 08 '18 at 06:45
  • i want to get the output in a dataframe – Gautam Biswas Mar 08 '18 at 07:16
  • Can you please help me to get the data into a dataframe ? – Gautam Biswas Mar 08 '18 at 14:22
  • I've edited the above code to help you get the data into a dataframe. However, its not an easy task considering the sites you provide. It's a mixture of different tables and one is even a txt file. The above code works for all but two urls – TTR Mar 13 '18 at 16:59
0

try will not let you skip, but rather return an error of class try-error if there is a problem.

so afterwards, you could still add something like:

check <- sapply(df, class) != "try-error"
df <- df[check]

or use tryCatch directly:

df <- lapply(base_url, function(u) {
  tryCatch({
    html_obj <- read_html(u)
    draft_table <- html_nodes(html_obj,'table')
    cik <- substr(u,start = 41,stop = 47)
    draft1 <- html_table(draft_table,fill = TRUE)
    final <- c(cik,draft1)
  }, error = function(x) NULL)
})
RolandASc
  • 3,863
  • 1
  • 11
  • 30
  • Is there any other way I can do this.I wanted to continue with the loop to scrape data from site rather throwing error and the loop stops. ? – Gautam Biswas Mar 07 '18 at 12:44
  • I am not sure what you mean, like this no error should be thrown and the loop should continue. I add an alternative with `tryCatch`, but like that you still have the `NULLs` that you may need to remove depending on how you want to continue – RolandASc Mar 07 '18 at 12:48
  • It is throwing an error : Error in names(object) <- nm : 'names' attribute [1] must be the same length as the vector [0] – Gautam Biswas Mar 07 '18 at 13:00
  • where exactly now? how have you modified your code thus far? – RolandASc Mar 07 '18 at 13:09
0

You can try something like this.

for(i in something)
{
  res <- try(expression_to_get_data)
  if(inherits(res, "try-error"))
  {
    #error handling code, maybe just skip this iteration using
    continue
  }
  #rest of iteration for case of no error
}

Source of solution

Domnick
  • 509
  • 8
  • 25