I am currently trying to scrape a website with a combination of Rselenium
,rvest
, and the tidyverse
.
The goal is to go to this this website, click on one of the links (for instance, "Promo"), and then extract the entire table of data (e.g., card, and graded prices) using rvest
.
I was able to get the table extracted without too much of an issue using the following code:
library(RSelenium)
library(rvest)
library(tidyverse)
pokemon <- read_html("https://www.pricecharting.com/console/pokemon-promo")
price_table <- pokemon %>%
html_elements("#games_table") %>%
html_table()
However, this has a couple of issues: 1) I cannot go through all the different card sets on the inital website link I provided (https://www.pricecharting.com/category/pokemon-cards), and 2) I cannot extract the entire table with this method - only what is primarly loaded.
To mitigate these issues I was looking into Rselenium
. What I decided to do was go to the intial website, click on the link to a card set (e.g. "Promo"), and then load the entire page. This workflow can be shown here:
## open driver
rD <- rsDriver(browser="firefox", port=4545L, verbose=F)
remDr <- rD[["client"]]
## navigate to primary page
remDr$navigate("https://www.pricecharting.com/category/pokemon-cards")
## click on the link I want
remDr$findElement(using = "link text", "Promo")$clickElement()
## find the table
table <- remDr$findElement(using = "id", "games_table")
## load the entire table
table$sendKeysToElement(list(key = "end"))
## get the entire source
full_table <- remDr$getPageSource()[[1]]
## read in the table
html_page <- read_html(full_table)
## Do the `rvest` technique I had above.
html_page %>%
html_elements("#games_table") %>%
html_table()
However, my issue is that I am once again getting the same 51 elements instead of the entire table.
I am wondering if it is possible to combine my two techniques, and where in my coding process this is going wrong.