Thanks to StackOverflow, I have been able to use the following code to download a series of photos on a public website.
urls <- c("https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=A12/0090/13",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=A12/0089/13",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=A12/0088/13",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=A12/0087/13",
"https://ec.europa.eu/consumers/consumers_safety/safety_products/rapex/alerts/?event=viewProduct&reference=A12/0086/13"
)
for (url in 1:length(urls)) {
print(url)
webpage <- html_session(urls[url])
link.titles <- webpage %>% html_nodes("img")
img.url <- link.titles %>% html_attr("src")
for(j in 1:length(img.url)){
download.file(img.url[j], paste0(url,'.',j,".jpg"), mode = "wb")
}
}
However, some links contain no photos, consequently returning an HTTP status error and stopping the download process.
So, I want to insert an if
command and tell the R to ignore/ bypass those pages that contain no photos or '404 Not Found' error. The thing is, however, I do not know what function or command would represent a page with no image or '404 Not Found' error. Any suggestions would be appreciated.