Scraping a website protected by a password in R

Question

I've looked for information regarding this problem (such as found here: Scrape password-protected website in R and threads linked to it) but I can't solve the quirks for my particular website.

I'd like to to use this website http://www.footballdatabase.eu/ which is password protected. Once logged in I would navigate to a webpage to start scraping. I've set up the code as follows:

url <- "http://www.footballdatabase.eu"
login <- list(login = "username", password = "password")

response <- POST(url = url, body = login)

After I would then set navigate to a webpage by entering a new link:

link <- "http://www.footballdatabase.eu/football.match.cagliari.ac-milan.1141156.en.html"
doc <- htmlParse(link, encoding = "UTF-8")
table <- readHTMLTable(doc)

and then I would take the information from the table.

I would like it setup so that in theory I could either loop through multiple webpages after logging in once, or logging in each team (if that's more convenient). Currently I get the following output from response:

Response [http://www.footballdatabase.eu/]
Date: 2017-05-31 01:03
Status: 200
Content-Type: text/html
Size: 2.17 kB
You have exceeded your amount of pages visited during the day. Please sign in with your login or register.

Thank you for any help!

You might want to try out what's done in [this example](https://github.com/hadley/rvest/blob/master/demo/united.R). I think you'll want to open the session, login, then navigate around within the same session. — austensen, May 31 '17 at 01:30
That error message could indicate your login is not working and the site things you are just a robot. — Mark Stewart, May 31 '17 at 01:43
@austensen I've had look at this but unfortunately I don't think the page uses a log-in form. I've not been able to work out how to select the button. — Guy W, Jun 01 '17 at 16:50
@MarkStewart It's definitely not working currently, I'm just at a loss how to click the button to log in since the page doesn't appear to use a form. It's something to do with this "connectu" variable in the HTML — Guy W, Jun 01 '17 at 16:51

Scraping a website protected by a password in R

0 Answers0