4

Looking to use RSelenium and Tor using my Linux machine to return the Tor IP (w/Firefox as Tor Browser). This is doable with Python, but having trouble with it in R. Can anybody get this to work? Perhaps you can share your solution in either Windows / Linux.

# library(devtools)
# devtools::install_github("ropensci/RSelenium")
library(RSelenium)

RSelenium::checkForServer()
RSelenium::startServer() 

binaryExtension <- paste0(Sys.getenv('HOME'),"/Desktop/tor-browser_en-US/Browser/firefox")
remDr <- remoteDriver(dir = binaryExtention)

remDr$open()
remDr$navigate("http://myexternalip.com/raw")
remDr$quit()

The error Error in callSuper(...) : object 'binaryExtention' not found is being returned.

For community reference, this Selenium code works in Windows using Python3:

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary

from os.path import expanduser # Finds user's user name on Windows

# Substring inserted to overcome r requirement in FirefoxBinary 
binary = FirefoxBinary(r"%s\\Desktop\\Tor Browser\\Browser\\firefox.exe"  % (expanduser("~")))
profile = FirefoxProfile(r"%s\\Desktop\\Tor Browser\\Browser\\TorBrowser\\Data\\Browser\\profile.default" % (expanduser("~")))

driver = webdriver.Firefox(profile, binary)
driver.get('http://myexternalip.com/raw')   
html = driver.page_source
soup = BeautifulSoup(html, "lxml") # lxml needed

# driver.close()

# line.strip('\n')
"Current Tor IP: " + soup.text.strip('\n')

# Based in part on
# http://stackoverflow.com/questions/13960326/how-can-i-parse-a-website-using-selenium-and-beautifulsoup-in-python
# http://stackoverflow.com/questions/34316878/python-selenium-binding-with-tor-browser
# http://stackoverflow.com/questions/3367288/insert-variable-values-into-a-string-in-python
Bob Hopez
  • 773
  • 4
  • 10
  • 28
  • Please view [this link](http://stackoverflow.com/questions/38799909/rselenium-doesnt-connect), which suggests `javaargs` should set binary in `startServer`. However, when I do this it opens only the Firefox (non-Tor browser). It then throws `driver.version: unknown` error. – Bob Hopez Aug 19 '16 at 23:15
  • I've opened a bounty if you know how to do this for the new RSelenium: https://stackoverflow.com/questions/50829457/rselenium-with-tor-with-new-rselenium-version – Neal Barsch Jun 16 '18 at 00:14

3 Answers3

5

Something like the following should work:

browserP <- paste0(Sys.getenv('HOME'),"/Desktop/tor-browser_en-US/Browser/firefox")
jArg <- paste0("-Dwebdriver.firefox.bin='", browserP, "'")
selServ <- RSelenium::startServer(javaargs = jArg)

UPDATE:

This worked for me on windows. Firstly run the beta version:

checkForServer(update = TRUE, beta = TRUE, rename = FALSE)

Next open a version of the tor browser manually.

library(RSelenium)
browserP <- "C:/Users/john/Desktop/Tor Browser/Browser/firefox.exe"
jArg <- paste0("-Dwebdriver.firefox.bin=\"", browserP, "\"")
pLoc <- "C:/Users/john/Desktop/Tor Browser/Browser/TorBrowser/Data/Browser/profile.meek-http-helper/"
jArg <- c(jArg, paste0("-Dwebdriver.firefox.profile=\"", pLoc, "\""))
selServ <- RSelenium::startServer(javaargs = jArg)

remDr <- remoteDriver(extraCapabilities = list(marionette = TRUE))
remDr$open()
remDr$navigate("https://check.torproject.org/")

> remDr$getTitle()
[[1]]
[1] "Congratulations. This browser is configured to use Tor."
jdharrison
  • 30,085
  • 4
  • 77
  • 89
  • This opens the Firefox (non-Tor) browser, then throws a long `server-side error` – Bob Hopez Aug 19 '16 at 23:38
  • It would open the browser that is given by the path. Looking at the python example you would also need to provide a firefox tor profile. – jdharrison Aug 19 '16 at 23:39
  • look at `?RSelenium::getFirefoxProfile` – jdharrison Aug 19 '16 at 23:42
  • Originally I went down that path. This it may be something like `-Dwebdriver.firefox.profile` based on your answer above and [here](https://groups.google.com/forum/#!topic/jmeter-plugins/kgcOm80d-vY) – Bob Hopez Aug 19 '16 at 23:44
  • Know how to combine `javaargs` in R within this context? This reference doesn't go into that:https://siteobservers.zendesk.com/hc/en-us/articles/201989070-Start-selenium-server-with-additional-command-line-arguments – Bob Hopez Aug 19 '16 at 23:49
  • Perhaps you want to give this code a try, used the suggested java argument approach, but still not opening Tor without error. http://www.r-fiddle.org/#/fiddle?id=YyZt6ZLG – Bob Hopez Aug 20 '16 at 00:10
  • The above works for me on windows. An existing Tor browser needs to be running. – jdharrison Aug 20 '16 at 02:01
  • Tried your code on Windows, and getting `Selenium message: The path to the driver executable must be set by the webdriver.gecko.driver system property; for more information, see https://github.com/jgraham/wires. The latest version can be downloaded from https://github.com/jgraham/wires `. Did you install gecko into the Tor folder? – Bob Hopez Aug 20 '16 at 02:34
  • Why add "...bin=\ and ...profile=\" -- the backslashes ? Also, `remDr$open()` opens Tor browser using Python. Not in R? – Bob Hopez Aug 20 '16 at 02:35
  • @BobHopez sorry you have lost me. The slashes are used to escape the " . The `open` method is a member of the `remoteDriver` class in `RSelenium` – jdharrison Aug 20 '16 at 02:41
  • To clarify one point: In Python, the command above launches the Tor browser: `webdriver.Firefox(profile, binary)`. And when working with rSelenium, the `remDr$open()` launches the Firefox/Chrome browser usually. – Bob Hopez Aug 20 '16 at 02:50
  • Thanks for clarifying why to use the two back slash string instances. Gecko installed? – Bob Hopez Aug 20 '16 at 02:52
  • Why `list(marionette = TRUE)`? – Bob Hopez Aug 20 '16 at 02:53
  • 1
    You can download the geckodriver and add it as a system property see https://github.com/ropensci/RSelenium/issues/81 – jdharrison Aug 20 '16 at 02:58
  • And thank you for clarifying to the community. Why not Tor 'profile.default' such as with the Python example? – Bob Hopez Aug 20 '16 at 02:58
  • Strange that rSelenium requires it to launch Tor Firefox in Windows, whereas the above Python launches Tor Firefox browser without a path to gecko. – Bob Hopez Aug 20 '16 at 03:02
  • What is the updated version of this @jdharrison? startServer is defunct with new RSelenium, so how do you pass javaargs to remoteDriver? – Neal Barsch Jun 13 '18 at 05:04
  • I've opened a bounty on the answer for the new RSelenium syntax to do this . https://stackoverflow.com/questions/50829457/rselenium-with-tor-with-new-rselenium-version – Neal Barsch Jun 16 '18 at 00:14
3

This works in MacOS Sierra.

First you need to configure both the Firefox and Tor browser Manual Proxy.

Go to your Preferences>Advanced>Network>Settings

Set SOCKS Host: 127.0.0.1 Port:9150 Check -> on SOCKS v5 in the browser menu bar.

You will also need to have Tor Browser open whilst running the R script in Rstudio ....otherwise you will get a message in the firefox browser "The proxy server is refusing connections"

You will also need to copy the name of your firefox profile in the script profile-name

Open Finder and got to /Users/username/Library/Application Support/Firefox/Profiles/profile-name

My R test script

 require(RSelenium)

    fprof <- getFirefoxProfile("/Users/**username**/Library/Application\ Support/Firefox/Profiles/nfqudbv2.default-1484451212373",useBase=TRUE)

    remDrv <- remoteDriver( browserName = "firefox"
                            , extraCapabilities = fprof)

    remDrv$open()
    remDrv$navigate("https://check.torproject.org/")

This will open an instance of the Firefox browser with the message "Congratulations. This browser is configured to use Tor."

Ashley72
  • 31
  • 1
1

Caveat: I have not tested extensively, but it seems to work.

Relying on some ideas from @Ashley72 but avoiding manual setups and copying (as well as now defunct functions from Rselenium needed for the solution from @jdharrison) and some ideas from https://indranilgayen.wordpress.com/2016/10/24/make-rselenium-work-with-r/ adjust the following profile options (I usually adjust a number of other options, but they do not seem relevant for the question):

fprof <- makeFirefoxProfile(list(network.proxy.socks = "127.0.0.1", # for proxy settings specify the proxy host IP  
network.proxy.socks_port = 9150L, # proxy port. Last character "L" for specifying integer is very important and if not specified it will not have any impact
network.proxy.type = 1L, # 1 for manual and 2 for automatic configuration script. here also "L" is important    
network.proxy.socks_version=5L, #ditto     
network.proxy.socks_remote_dns=TRUE))

Then you start the server as usual:

rD <- rsDriver(port = 4445L, browser = "firefox", version = "latest", geckover = "latest", iedrver = NULL, phantomver = "2.1.1",
               verbose = TRUE, check = TRUE, extraCapabilities = fprof) # works for selenium server: 3.3.1 and geckover: 0.15.0; Firefox: 52
remDr <- rD[["client"]]
remDr <- rD$client
remDr$navigate("https://check.torproject.org/") # should confirm tor is setup
remDr$navigate("http://whatismyip.org/") # should confirm tor is setup

As you see, I have not made changes to the marionette option. I have no idea what the implications might be. Please comment.

EDIT: the Tor Browser has to be up and running, it seems. Otherwise, the browser opened by Rselenium gives an error "proxy server refusing connection."

BBB
  • 150
  • 1
  • 11