0

I am new to the world of r, I have not been able to skip the URLs that according to the website show: “ 504 error That content doesn't seem to exist…” There exists a list of people on the website that I need to get the table and also information in the nested links for each of those people. But only the webpage is giving 504 error for 1 person (84th person) so I would like to know how I can skip the page so that in my data frame the webpage for that specific person to be marked as non-existent. Thanks for your help.

here is my code:

***library(rvest)
library(dplyr)
library(stringr)
library(jsonlite)
library(readr)


url="https://www.barrons.com/advisor/report/top-financial-advisors/100?id=/100/2022&type=ranking_tables"
doc = fromJSON(txt=url)
result = doc$data$data
print(result)

link=str_split_fixed(doc$data$data$Advisor, "\'", n = Inf)

advisor_links= link[,4]

for (i in 1: length(advisor_links)){
  name_link=advisor_links[i]
  advisor_page= read_html(name_link)
  position= advisor_page%>% html_nodes(".BarronsTheme--lg--18rTokdG p:nth-child(1)")%>% html_text()%>% paste(collapse = ",")
  print(position)
}***
Zah
  • 1

1 Answers1

0

If you know the index of the person you want to remove, you can simply omit it in the advisor_links before you call your for loop function.

advisor_links <- advisor_links[-84]

If there are multiple websites that errors out

I would suggest using tryCatch function (How to write trycatch in R) and put it inside your for loop function like so:

for (i in 1: length(advisor_links)){
    name_link = advisor_links[i]
    tryCatch({
        name_link=advisor_links[i]
        advisor_page = read_html(name_link)
        position = advisor_page %>% 
          html_nodes(".BarronsTheme--lg--18rTokdG p:nth-child(1)")%>% 
          html_text()%>% 
          paste(collapse = ",")
        if(position == "") print("Non-existent")
        else print(position)}, error = function(e) NULL)
    }
John Manacup
  • 314
  • 1
  • 4
  • First of all, Thank you so much for your quick response, As I need to keep all the advisors and if the link does not work for them only to get a "Non-existent" message for them. So I changed the code a little bit but it only shows the error and still, I can't use the data I need some output like this: (([1]"Managing Director" [1]"Managing Director" [1]"Partner, Wealth Advisor" [1]"Non-existent" [1]"Partner, Wealth Advisor")) I would appreciate your time if you could let me know how to fix this. – Zah Oct 07 '22 at 16:58
  • Edited. `tryCatch` can be a little bit wonky sometimes. It does not catch the error even though it errors out but knowing that `position` is an empty string, we can just print `"Non-existent"` – John Manacup Oct 07 '22 at 21:42
  • Since you erased the function(e) from the code, it still runs into the error. So, I tried to have it with the ( if else) argument at the same time, but still can't solve my problem: I need to get all of the data for all of the advisors, and if the link for one of them is not available, the codes give out a "NA" in the output for that specific advisor. – Zah Oct 08 '22 at 01:02
  • I forgot to add the `error` argument. This should solve it now without any error. – John Manacup Oct 08 '22 at 05:02