OK, I'm going to go with Selenium then since it doesn't necessarily require Docker (though the example I'm using is with Docker :-) Pretty sure I could get Splash/splashr to do this as well, but it involves a file download and I think there's issues with that and the Splash back-end. As the splashr
author, I avoid having to deal with GitHub issues if I use Selenium for this example as well ;-)
Anyway, you should install RSelenium
. I can't really provide support for that but it's well documented and the rOpenSci folks are super helpful. I'd highly suggest getting Docker to run on your system or getting your department to setup a Selenium server you all can use.
There are a couple gotchas for this use-case:
- Some element names we need to instrument are dynamically generated so we have to work around that
- This involves downloading a CSV file so we need to map a filesystem path in Docker so it downloads properly
- This is a super slow site so you need to figure out wait times after each interaction (I'm not going to do that since you may be on a slower or faster network and network speed does play a part here, tho not that much)
I'd suggest working through the vignettes for RSelenium before trying the below to get a feel for how it works. You're essentially coding up human page interactions.
You will need to start Docker with a mapped directory. See download file with Rselenium & docker toolbox for all the info but here's how I did it on my macOS box:
docker run -d -v /Users/hrbrmstr/Downloads://home/seluser/Downloads -p 4445:4444 selenium/standalone-firefox:2.53.1
That makes Selenium accessible on port 4445, uses Firefox (b/c Chrome is evil) and maps my local downloads directory to the Firefox default dir for the selenium user in the Docker container. That means well_authorizations_issued.csv
is going to go there (eventually).
Now, we need to crank up R and connect it to this Selenium instance. We need to create a custom Firefox profile since we're saving stuff to disk and we don't want the browser to prompt us for anything:
library(RSelenium)
makeFirefoxProfile(
list(
browser.download.dir = "home/seluser/Downloads",
browser.download.folderList = 2L,
browser.download.manager.showWhenStarting = FALSE,
browser.helperApps.neverAsk.saveToDisk = "text/csv"
)
) -> ffox_prof
remoteDriver(
browserName = "firefox", port = 4445L,
extraCapabilities = ffox_prof
) -> remDr
invisible(remDr$open())
remDr$navigate("https://reports.bcogc.ca/ogc/f?p=AMS_REPORTS:WA_ISSUED")
# Sys.sleep(###)
magick::image_read(openssl::base64_decode(remDr$screenshot()[[1]]))
You will need to uncomment out the Sys.sleep()
s and experiment with various "wait time" values between calls. Some will be short (1-2s) others will be larger (20s, 30s, or higher).
I'm not displaying the output of the screenshots here but those are one way to figure out timings (i.e. keep generating screen shots after an element interaction until gray spinner boxes are gone — etc — and keep a mental note of how many seconds that was).
Now, the one tricky bit noted above is figuring out the where the checkbox is to turn off the filter since it has a dynamic id
. However, we aren't actually going to click on the checkbox b/c the daft fools who created that app have no idea what they are doing and actually have the click-event trapped with the span
element that surrounds it, so we have to find the li
element that contains the checkbox label text then go to the span
element and click on it.
box <- remDr$findElement(using = "xpath", value = "//li[contains(., 'Approval Date is in the last')]/span")
box$clickElement()
# Sys.sleep(###)
magick::image_read(openssl::base64_decode(remDr$screenshot()[[1]]))
^^ definitely needs a delay (you likely saw it spin a while in-person when clicking yourself so you can count that and add in some buffer seconds).
Then, we click on the drop-down "menu" (it's really a button
):
btn1 <- remDr$findElement(using = "css", "button#WA_ISSUED_actions_button")
btn1$clickElement()
# Sys.sleep(###)
magick::image_read(openssl::base64_decode(remDr$screenshot()[[1]]))
Then the download "menu" item (it's really a button
:
btn2 <- remDr$findElement(using = "css", "button#WA_ISSUED_actions_menu_14i")
btn2$clickElement()
# Sys.sleep(###)
magick::image_read(openssl::base64_decode(remDr$screenshot()[[1]]))
^^ also needs rly needs a delay as the Download "dialog" takes a few seconds to come up (it did for me at least).
Now, find the CSV box which is really an a
tag:
lnk <- remDr$findElement(using = "css", "a#WA_ISSUED_download_CSV")
lnk$clickElement()
### WAIT A WHILE
magick::image_read(openssl::base64_decode(remDr$screenshot()[[1]]))
That last bit is something you'll have to experiment with. It takes a while to process the request and then transfer the ~9MB file. The call to rmDr$screenshot()
actually waits for the download to complete so you can remove the display and decoding code and assign the output to a variable and use that as an automatic "wait"er.
I tried this 3x on 2 different macOS systems and it worked fine. YMMV.
I'm guessing you'll want to automate this eventually so you could have a system()
call towards the top of the script that starts the Selenium Docker container, then does the rest of the bits and then issues another system()
call to shut down the Docker container.
Alternately, https://github.com/richfitz/stevedore is now on CRAN so it is a pure R interface to starting/stopping Docker containers (amongst many other things) so you could use that instead of system()
calls.
If you can't use Docker, you need to install a "webdriver" executable for Firefox on your Windows box and also get the Selenium Java archive, ensure you have Java installed and then do the various manual incantations to get that going (which is beyond the scope of this answer).
Here's a shortened, contiguous version of the above:
library(RSelenium)
# start Selenium before doing this
makeFirefoxProfile(
list(
browser.download.dir = "home/seluser/Downloads",
browser.download.folderList = 2L,
browser.download.manager.showWhenStarting = FALSE,
browser.helperApps.neverAsk.saveToDisk = "text/csv"
)
) -> ffox_prof
remoteDriver(
browserName = "firefox", port = 4445L,
extraCapabilities = ffox_prof
) -> remDr
invisible(remDr$open())
remDr$navigate("https://reports.bcogc.ca/ogc/f?p=AMS_REPORTS:WA_ISSUED")
# Sys.sleep(###)
box <- remDr$findElement(using = "xpath", value = "//li[contains(., 'Approval Date is in the last')]/span")
box$clickElement()
# Sys.sleep(###)
btn1 <- remDr$findElement(using = "css", "button#WA_ISSUED_actions_button")
btn1$clickElement()
# Sys.sleep(###)
btn2 <- remDr$findElement(using = "css", "button#WA_ISSUED_actions_menu_14i")
btn2$clickElement()
# Sys.sleep(###)
lnk <- remDr$findElement(using = "css", "a#WA_ISSUED_download_CSV")
lnk$clickElement()
### WAIT A WHILE
done <- remDr$screenshot()
# stop Selenium