5

I am trying to scrape a website with RSelenium. However, I run into problems when I want to connect to the Selenium server.

Imagine I use the rsDriver() command to start a selenium server and browser:

rsDriver(browser = c('firefox'))

This is the output generated:

[1] "Connecting to remote server"
Fehler in checkError(res) :
Couldnt connect to host on http://localhost:4567/wd/hub.
Please ensure a Selenium server is running.
Zusätzlich: Warnmeldung:
In rsDriver(browser = c("firefox")) : Could not determine server status.

Alternatively I tried this command (found it in another thread on stackoverflow):

remDr <- remoteDriver(remoteServerAddr = "localhost" 
                      , port = 4444L
                      , browserName = "htmlunit"
)
remDr$open()

But it fails:

[1] "Connecting to remote server"
Fehler in checkError(res) : 
  Couldnt connect to host on http://localhost:4444/wd/hub.
  Please ensure a Selenium server is running.

This is my sessioninfo:

R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.2

locale:
[1] de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] seleniumPipes_0.3.7 whisker_0.3-2       magrittr_1.5        xml2_1.1.1          jsonlite_1.2        httr_1.2.1         
[7] RSelenium_1.7.1     wdman_0.2.2        

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.9      XML_3.98-1.5     binman_0.1.0     assertthat_0.1   bitops_1.0-6     rappdirs_0.3.1   R6_2.2.0        
 [8] semver_0.2.0     curl_2.3         subprocess_0.8.0 tools_3.3.2      yaml_2.1.14      caTools_1.17.1   openssl_0.9.6   

I use Firefox version 51.0.1 (64-bit) on a macOS Sierra version 10.12.2.

Any help is greatly appreciated!

Thomas Reiss
  • 267
  • 3
  • 10

3 Answers3

6

Thanks @jdharrison! I had a similar problem and was puzzled because yesterday RSelenium was still working fine, but today it would not start the browser anymore. Running:

library(wdman)
selServ <- wdman::selenium(verbose = FALSE)
selServ$log()

Showed me that the problem was caused by a corrupted jarfile that had downloaded overnight:

"Error: Invalid or corrupt jarfile C:\\Users\\user.name\\AppData\\Local\\binman\\binman_seleniumserver\\generic\\3.8.0/selenium-server-standalone-3.8.0.jar"

Automatically, the rsDriver() function in RSelenium uses the newest selenium-server-standalone jarfile. Everything worked normally again when I ran rsDriver with previous jarfile instead:

rD <- rsDriver(verbose = FALSE, version = "3.7.1")
eh21
  • 629
  • 7
  • 7
4

Check whether a Selenium Server is running. You can try running one automatically:

library(RSelenium)
library(wdman)
selServ <- wdman::selenium(verbose = FALSE)

You can then check the logs to see if there are any issues:

selServ$log()

Alternatively you can try running a Selenium Server manually:

library(RSelenium)
library(wdman)
selServ <- wdman::selenium(retcommand = TRUE, verbose = FALSE)

Then manually run the output from cat(selServ) in a terminal:

> cat(selServ)
/usr/bin/java -Dwebdriver.chrome.driver='/Users/admin/Library/Application Support/binman_chromedriver/mac64/2.27/chromedriver' -Dwebdriver.gecko.driver='/Users/admin/Library/Application Support/binman_geckodriver/macos/0.14.0/geckodriver' -Dphantomjs.binary.path='/Users/admin/Library/Application Support/binman_phantomjs/macosx/2.1.1/phantomjs-2.1.1-macosx/bin/phantomjs' -jar '/Users/admin/Library/Application Support/binman_seleniumserver/generic/3.0.1/selenium-server-standalone-3.0.1.jar' -port 4567
jdharrison
  • 30,085
  • 4
  • 77
  • 89
  • Thank you very much for your help. Your solution didn't work for some reason, however, I was able to get it running with an alternative way. – Thomas Reiss Feb 17 '17 at 13:32
  • 2
    @J.Doe If the solution didn't work you should post output from the logs. If you were able to get it running an alternative way you should post that as an answer to add future viewers. – jdharrison Feb 17 '17 at 13:37
  • Thanks. @jdharrison do you happen to know if it's possible to change the path to java.exe on Windows? The default c:\windows\system32\java.exe seems to be an outdated exe. (I substituted it and now it works, i.e. the server starts by script, but changing the path might have been a better option.) – lukeA Mar 10 '17 at 12:04
  • Not currently. However I can add an option in the wdman package. Currently it takes the system value for java https://github.com/johndharrison/wdman/blob/3678b7f2a9e2a11a895f79a551f5c0a2405adbaf/R/selenium.R#L125 . An optional path for JAVA could be given in the `selenium` function. – jdharrison Mar 10 '17 at 12:08
0

I was able to fix this by passing extra capabilities to the RSDriver call that gave the explicit path to my firefox .exe file. I've used this approach for a few years, but had a similar issue a few months ago when a firefox update changed the path from one in the app data directory to the other location listed generically below.

eCaps <- list("firefox_binary" = "C:/Users/username/AppData/Local/Mozilla Firefox/firefox.exe")
remDr<- rsDriver(verbose = FALSE,
             browser = "firefox",
             extraCapabilities = eCaps
             )

For what it's worth - I've found that in contrast to python selenium, in R firefox is just much easier to negotiate updates with this method. Keeping chrome browser and driver versions consistent has been a headache in R for me, though the default settings for python selenium doesn't seem to have these problems.