I am trying to scrap titles and contents from a list of urls using r. I am able to extract title and content for each article individually. However, I need to loop through these list of urls to get the title from each page and its content.
These are the urls, and they are stored in a csv file: http://well.blogs.nytimes.com/2016/08/29/edible-sunscreens-all-the-rage-but-no-proof-they-work/?smid=fb-nytwell&smtyp=cur
http://www.nytimes.com/2016/08/29/opinion/why-we-never-die.html?smid=fb-nytwell&smtyp=cur
This is the code I used to extract each article individually (note that each paragraph of the content is considered to be a node and when I extract these nodes each one appears in a new raw while I need them to be only in the first raw).
install.packages('xml2')
library(xml2)
library(rvest)
url <- "http://well.blogs.nytimes.com/2016/08/29/edible-sunscreens-all-the-rage-but-no-proof-they-work/?smid=fb-nytwell&smtyp=cur"
article <- read_html(url)
title <- article %>% html_node(".entry-title") %>% html_text()
content <- article %>% html_nodes(".story-body-text") %>% html_text()
article_table <- data.frame(title, content)
article_table