1

I would like to scrape only the candidate names from these tables and the votes that are reported in the third column (after the image, candidate name).
This is as far as I've gotten.

library(rvest)
ndp_leadership<-url('https://en.wikipedia.org/wiki/New_Democratic_Party_leadership_elections')
results<-read_html(ndp_leadership, 'table')

results<-html_nodes(results, 'table')
out<-results %>% 
html_nodes(xpath="//*[contains(., 'Candidate')]//tr/td")
out
Dave2e
  • 22,192
  • 18
  • 42
  • 50
spindoctor
  • 1,719
  • 1
  • 18
  • 42
  • So what exactly if your question? "Please do this for me" is not a question. Since wikipedia pages can be edited at anytime, it's not helpful to use as example data. Try to include a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in the question itself. – MrFlick Nov 06 '17 at 18:53
  • rather than close, adding an XPath tag to re-gear the question in that direction since it's definitely more than than R-related. – hrbrmstr Nov 06 '17 at 19:19

1 Answers1

0

Although this doesn't really use XPath, here's one way to do it:

results <- read_html(ndp_leadership) %>%
  html_nodes(".wikitable") %>% 
  html_table(fill=TRUE) %>% 
  map(~ .[,2]) %>% 
  unlist %>% 
  setdiff(., c("Candidate", "Total"))
David Klotz
  • 2,401
  • 1
  • 7
  • 16