-1

I imported the csv file that I want to use in r. Here, I am trying to call one of the columns from the csv file. This column has a list of urls titled "URLs". Then, I want the code which I have to scrap data from each url. In short, I want to use more efficient way than listing all the urls in c() function since I have about 200 links.

https://www.nytimes.com/2018/04/07/health/health-care-mergers-doctors.html?rref=collection%2Fsectioncollection%2Fhealth https://www.nytimes.com/2018/04/11/well/move/why-exercise-alone-may-not-be-the-key-to-weight-loss.html?rref=collection%2Fsectioncollection%2Fhealth https://www.nytimes.com/2018/04/07/health/antidepressants-withdrawal-prozac-cymbalta.html?rref=collection%2Fsectioncollection%2Fhealth https://www.nytimes.com/2018/04/09/well/why-you-should-get-the-new-shingles-vaccine.html?rref=collection%2Fsectioncollection%2Fhealth https://www.nytimes.com/2018/04/09/health/fda-essure-bayer-contraceptive-implant.html?rref=collection%2Fsectioncollection%2Fhealth https://www.nytimes.com/2018/04/09/health/hot-pepper-thunderclap-headaches.html?rref=collection%2Fsectioncollection%2Fhealth

The error appears when running this: article <- links %>% map(read_html).

It gives me this message:

(Error in UseMethod("read_xml") : 
no applicable method for 'read_xml' applied to an object of class "factor")

Here is the code:

setwd("C:/Users/Majed/Desktop")

d <- read.csv("NYT.csv")

d

links<- d$URLs

article <- links %>% map(read_html)

title <-
  article %>% map_chr(. %>% html_node("title") %>% html_text())

content <-
  article %>% map_chr(. %>% html_nodes(".story-body-text") %>% html_text() %>% paste(., collapse = ""))

article_table <- data.frame("Title" = title, "Content" = content)

1 Answers1

1

Pay attention to the meaning of your error message: read_html expects a character string, but you're giving it a factor. read.csv converts strings to factors, unless you include the argument stringsAsFactors = F. read_csv from readr is a good alternative if you, like me, forget that you don't want strings automatically turned into factors.

I can't reproduce the problem without your data, but try converting the URLs to strings:

links <- as.character(d$URLs)

article <- links %>% map(read_html)
camille
  • 16,432
  • 18
  • 38
  • 60
  • Thank you for your answer. I tried this however the second line is not executing. It take so long and no action comes out of it. – Majed Alghamdi Apr 12 '18 at 16:46
  • You'll need to include at least some of your data for anyone to try to reproduce the problem. – camille Apr 12 '18 at 16:50
  • I included some of the urls in the post. – Majed Alghamdi Apr 12 '18 at 16:57
  • This doesn't tell me what your data looks like. Please see [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for how to make a reproducible question. That way it will be easier for us to help you. – camille Apr 12 '18 at 17:00