10

Has anyone ever filled in a web form remotely from R?

I'd like to do some archery statistics in R using my scores. There is a very handy webpage, that gives you the classification and handicaps http://www.archersmate.co.uk/, which I naturally would want to include in my stats sheet.

Is it possible to fill this form in remotely and to get the results back to R???

Otherwise I would have to get all handicap tables and stick them into a database myself.

UPDATE: We've narrowed the problem down to the fact, that the form submit button is written in javascript.

alex23lemm
  • 5,475
  • 1
  • 21
  • 23
Joanne Demmler
  • 1,406
  • 11
  • 31
  • 3
    http://stackoverflow.com/search?q=[r]+webscraping – Ari B. Friedman Jan 09 '13 at 14:44
  • @AriB.Friedman that's really not enough.. :) this question involves understanding the [hcalc.js](http://www.archersmate.co.uk/scripts/ajax_hcalc.js) javascript, parsing through the `ajaxCalc()` and `ajaxiCalc()` functions to construct the `url` string used in the `/functions/iclass.php` call. i don't see an example of all that using R on S.O. – Anthony Damico Jan 09 '13 at 16:26
  • @AnthonyDamico I suspected it wouldn't be. But it gives her a starting point and a vocabulary, and thus a hope of narrowing her question. You've nobly and ably done a lot of the work of narrowing the question, so perhaps you could edit the question to be more targeted? – Ari B. Friedman Jan 09 '13 at 16:54
  • 1
    The answers to this question (http://stackoverflow.com/questions/5396461/how-to-automate-multiple-requests-to-a-web-search-form-using-r) should help, as they are focused on automating the process of filling in a web form and bringing the results into R. – eipi10 Jan 09 '13 at 21:22
  • Yes, I knew it was not just web-scraping. Thanks @eipi10 that does look very promising. – Joanne Demmler Jan 10 '13 at 09:23
  • @AnthonyDamico is right, the problem is hitting the submit button, as this part is written in javascript. – Joanne Demmler Jan 10 '13 at 09:59
  • @eipi10 i don't think the difficult part is hitting enter, i think it's building the input and sending those to `iclass.php`? – Anthony Damico Jan 10 '13 at 10:53
  • Perhaps I'll need to do it in PHP or javascript and only grab the output back into R. I'll give it a hard stare! – Joanne Demmler Jan 10 '13 at 12:38
  • @AnthonyDamico or you can inspect the network debug pane to reverse engineer the requests being sent to the server – hadley Jan 11 '13 at 16:09

4 Answers4

9

You can use the RSelenium package to fill out and submit web forms and to retrieve the results.

The following code leveraging RSelenium will download data for an example input (Male, Under 18, Longbow, Bristol V, 500):

library(RSelenium)

# Start Selenium Server --------------------------------------------------------

checkForServer()
startServer()
remDrv <- remoteDriver()
remDrv$open()


# Simulate browser session and fill out form -----------------------------------

remDrv$navigate('http://www.archersmate.co.uk/')
remDrv$findElement(using = "xpath", "//input[@value = 'Male']")$clickElement()
Sys.sleep(2) 
remDrv$findElement(using = "xpath", "//select[@id = 'drpAge']/option[@value = 'Under 18']")$clickElement()
remDrv$findElement(using = "xpath", "//input[@value ='Longbow']")$clickElement() 
remDrv$findElement(using = "xpath", "//select[@id = 'rnd']/option[@value = 'Bristol V']")$clickElement()
remDrv$findElement(using = "xpath", "//input[@id ='scr']")$sendKeysToElement(list('5', '0', '0'))
remDrv$findElement(using = "xpath", "//input[@id = 'cmdCalc']")$clickElement()

# Retrieve and download results injecting javascript ---------------------------

Sys.sleep(2)
clsf <- remDrv$executeScript(script = 'return $("#txtClass").val();', args = list())[[1]]
hndcp <- remDrv$executeScript(script = 'return $("#txtHandicap").val();', args = list())[[1]]

remDrv$quit()
remDrv$closeServer()

The default browser for RSelenium is Firefox. However, RSelenium even supports headless browsing using PhantomJS. For leveraging PhanomJS you just need to

  • download PhantomJS and place it in the users path
  • replace the code snippets at the beginning and at the end like described next

Default browsing (like shown above):

checkForServer()
startServer()
remDrv <- remoteDriver()

...

remDrv$quit()
remDrv$closeServer()

Headless browsing:

pJS <- phantom()
remDrv <- remoteDriver(browserName = 'phantomjs')

...

remDrv$close()
pJS$stop()
alex23lemm
  • 5,475
  • 1
  • 21
  • 23
  • From the source code how did you know to chose: using = "xpath", "//input[@value = 'Male']"?? How would I know how to fill in text on this webpage: https://ois.dk/ – Esben Eickhardt Apr 02 '17 at 15:55
0

You might want to take a look at Rcurl's postForm here and theres also a nice tutorial here

Omar Wagih
  • 8,504
  • 7
  • 59
  • 75
0

this might not help you, as I am searching for an answer to a similar problem, but looking at the URL you would like to scrape, the forms to fill are actuall HTML Forms, and you can get the description by:

url <- "http://www.archersmate.co.uk/"
forms <- getHTMLFormDescription(url)

Also look at the package "RHTMLForms" on omegahat.org

0

This cannot be done in RCurl because the form triggers an ajax event, so the postForm function will not be enough.

brucezepplin
  • 9,202
  • 26
  • 76
  • 129