Rvest and sites with JavaScript

Question

I am using rvest to try to scrape the High Interest Savings rate of 1.40 from this site. First I tried to use Selector Gadget to obtain the appropriate CSS. This did not work because I believe the table is generated from JavaScript. Taking a hint from @QHarr here, if JavaScript is turned off the table does not generate.

If I try part of his example here:

# Rvest and Selector Gadget example

library("rvest")

from_html <- read_html("https://www.alternabank.ca/rates") %>%
  html_node('body') %>%
  html_text() %>%
  toString()

from_html does not contain the interest rate that I am looking for, 1.40.

What is the best way in R to scrape the interest rates from this page?

For websites that require javascript to be run, you'll want to use something like RSelenium to run a headless browser you can interact with. See also: https://blog.brooke.science/posts/scraping-javascript-websites-in-r/. Alternatively open up your browser dev tools and see if you can find if the data is being pulled from a different URL and grab that directly. — MrFlick, Aug 24 '20 at 20:16
@MrFlick this site's data can be obtained as JSON without Selenium but requires a fair bit of parsing to get at the desired data. I have a full working solution for the OP but can't share it since you closed the question. What's your advice in these circumstances? Are requests for help scraping specific sites too specific to make good SO questions, even if each one is a bit different in its requirements? Is that question itself one for Meta perhaps? — Allan Cameron, Aug 24 '20 at 20:37
@MrFlick FWIW, the single answer in the dupe you found requires Selenium and phantomjs, which are not required to solve the problem at hand. The other technique you mentioned is the better one here. — Allan Cameron, Aug 24 '20 at 20:41
@AllanCameron My opinion is that it doesn't make sense to open a new question for every possible URL out there and every bit of data from every page. There are other duplicates that go through developer tools to find the JSON links and what not. But if you think this one is particularly educational or likely to help others, I'll reopen. But most questions of this type don't see to meet the "shows research effort" requirement — MrFlick, Aug 24 '20 at 20:41
@AllanCameron Well, with Rselenium, you don't have to constantly reverse engineer different websites. you just interact with the site by stating the commands you would normally do as a user. It seems more fragile if the implementation changes. But really, web-scraping in all forms is inherently fragile and likely to break over time. — MrFlick, Aug 24 '20 at 20:46
@MrFlick thanks. I agree with you, though I do feel a bit sorry for the OP. I don't think my answer is particularly educational or of broader importance, so I won't ask you to reopen. I'll just ask the OP to try `jsonlite::fromJSON("https://www.alternabank.ca/gateway/api/rates-presentation-service/v2/rates")` which will get them most of the way there. — Allan Cameron, Aug 24 '20 at 20:48

Rvest and sites with JavaScript

0 Answers0

Linked