4

I have a website where I am trying to webscrape several items of the page. Additionally there are multiple pages in successive order that I want to scrape similar content, so I have built a for loop to loop through the pages.

Sometimes, the server may come back busy, and it needs to retry the element scrape (this can happen randomly). I have built code and that works great.

However though, within the scope of the for loop, there may be a random couple of pages that do not exist. I need to skip over those pages entirely and move to the next one.

I found this link to another example, that I have been unsuccessful in implementing within my code. How would I go about building a try catch method for both instances where to retry when the server is busy and to skip to the next url when the current one does not exist within the for loop?

Here is my code example (without too many specifics):

    library (rvest)

     DatasetAll <- NULL
     urllink <- c(1:100)
     for(i in urllink)
        {
          url <- paste0("http://www.somewebpage/link",i,".html")

          while(TRUE){  
          Name  <- try(url %>%
            read_html() %>%    
            html_nodes(xpath= '//*[contains(concat( " ", @class, " " ), concat( " ", "datarow", " " ))]//td[(((count(preceding-sibling::*) + 1) = 2) and parent::*)]') %>%
            html_text())
          if(!is(OppName, 'try-error')) break}        
          Namedf <- data.frame(Name)


          while(TRUE){
          Score  <- try(url %>%
            read_html() %>%    
            html_nodes(xpath= '//*[contains(concat( " ", @class, " " ), concat( " ", "datarow", " " ))]//td[(((count(preceding-sibling::*) + 1) = 3) and parent::*)]') %>%
            html_text())
          Scoredf <- data.frame(Score)

         Dataset <- cbind(Namedf, Scoredf)    
         other_calculation <-  Transform scraped data
         DatasetAll<-rbind(Dataset,DatasetAll)
        }
    write.table(DatasetAll, file = paste0("C:/My/Location.csv"), sep = ",", row.names=FALSE,qmethod = "double")
CooperBuckeye05
  • 149
  • 2
  • 14

0 Answers0