2

I'm using RSelenium to do some simple Google searches. Setup:

library(tidyverse)
library(RSelenium) # running docker to do this
library(rvest)
library(httr)

remDr <- remoteDriver(port = 4445L, browserName = "chrome")
remDr$open()

remDr$navigate("https://books.google.com/")
books <- remDr$findElement(using = "css", "[name = 'q']")

books$sendKeysToElement(list("NHL teams", key = "enter"))

bookElem <- remDr$findElements(using = "css", "h3.LC20lb")

That's the easy part. Now, there are 10 links on that first page, and I want to click on every link, back out, and then clink the next link. What's the most efficient way to do that? I've tried the following:

bookElem$clickElement() 

Returns Error: attempt to apply non-function - I expected this to click on the first link, but no good. (This works if I take the s off of findElements() - the above, not the for loop below).

clack <- lapply(bookElem, function(y) {

   y$clickElement()
   y$goBack() 

})

Begets an error, kind of like this question:

 Error:      Summary: StaleElementReference
             Detail: An element command failed because the referenced element is no longer attached to the DOM.
             Further Details: run errorDetails method 

Would it be easier to use rvest, within RSelenium?

papelr
  • 468
  • 1
  • 11
  • 42

1 Answers1

3

I think you could consider grabbing the links and looping through them without going back to the main page.

In order to achieve that, you will have to grab the link elements ("a tag").

bookElems <- remDr$findElements(using = "xpath",
                                "//h3[@class = 'LC20lb']//parent::a")

And then extracting the "href" attribute and navigate to that:

links <- sapply(bookElems, function(bookElem){
  bookElem$getElementAttribute("href")
})

for(link in links){
  remDr$navigate(link)
  # DO SOMETHING
}

Full code would read:

remDr$open()

remDr$navigate("https://books.google.com/")
books <- remDr$findElement(using = "css", "[name = 'q']")

books$sendKeysToElement(list("NHL teams", key = "enter"))
bookElems <- remDr$findElements(using = "xpath",
                                "//h3[@class = 'LC20lb']//parent::a")

links <- sapply(bookElems, function(bookElem){
  bookElem$getElementAttribute("href")
})

for(link in links){
  remDr$navigate(link)
  # DO SOMETHING
}
Tonio Liebrand
  • 17,189
  • 4
  • 39
  • 59
  • This worked - and your simple loop is a great base. One can just use `remDr$getCurrentUrl()` to check that it ended on the last link – papelr Mar 21 '19 at 15:28