Scraping only some columns from multiple tables

Question

I would like to scrape only the candidate names from these tables and the votes that are reported in the third column (after the image, candidate name).
This is as far as I've gotten.

library(rvest)
ndp_leadership<-url('https://en.wikipedia.org/wiki/New_Democratic_Party_leadership_elections')
results<-read_html(ndp_leadership, 'table')

results<-html_nodes(results, 'table')
out<-results %>% 
html_nodes(xpath="//*[contains(., 'Candidate')]//tr/td")
out

So what exactly if your question? "Please do this for me" is not a question. Since wikipedia pages can be edited at anytime, it's not helpful to use as example data. Try to include a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in the question itself. — MrFlick, Nov 06 '17 at 18:53
rather than close, adding an XPath tag to re-gear the question in that direction since it's definitely more than than R-related. — hrbrmstr, Nov 06 '17 at 19:19

score 0 · Accepted Answer · answered Nov 06 '17 at 20:10

0

Although this doesn't really use XPath, here's one way to do it:

results <- read_html(ndp_leadership) %>%
  html_nodes(".wikitable") %>% 
  html_table(fill=TRUE) %>% 
  map(~ .[,2]) %>% 
  unlist %>% 
  setdiff(., c("Candidate", "Total"))

answered Nov 06 '17 at 20:10

David Klotz

2,401
1
7
16

Scraping only some columns from multiple tables

1 Answers1