0

I would like to webscrape the following page to have a dataframe with the list of names and emails. However the following code return the following error after read_html Error in open.connection(x, "rb") : schannel: SNI or certificate check failed: SEC_E_WRONG_PRINCIPAL (0x80090322) - The target principal name is incorrect.

r<-read_html("https://www.biologie.lmu.de/personen/index.html")
b<- r %>%
  html_nodes('td') %>%
  html_text()
b<-gsub("  ", "", b)
b<-gsub('\n\n\n\n\n\n', '_', b, fixed = T)
b<-gsub('\n', '', b, fixed = T)
w<-which(grepl('@', b))
d<-data.frame(matrix(b, ncol=w[1], byrow=T),stringsAsFactors=FALSE)
d<-data.frame(people_name=d$X1, people_links=NA, emails=d[,w[1]], university="LMU Munich" )

P.S. When I go on the website from my browser it says that the connection is not safe

Giulia
  • 57
  • 7
  • `One or more of this website's certificates are invalid, so we can't guarantee its authenticity.` - ok I would avoid this site then in case. Was there a related safe url? – QHarr Feb 20 '21 at 19:52
  • Related (but not recommended by me)? https://stackoverflow.com/questions/24793863/devtoolsinstall-github-ignore-ssl-cert-verification-failure/24794044#24794044 Basically, ignore cert errors. Dangerous. – QHarr Feb 20 '21 at 23:31
  • Thank you @QHarr, I am puzzle about that. The url is simply a university website so I think is save but for some reason has a weird protection on it. Could it be a win firewall feature? Or something similar? – Giulia Feb 22 '21 at 09:24

0 Answers0