Hello fellow R fanatics...
I've been using R to scrape data from a variety of websites for a while now, however this one has me stumped.
I am trying to scrape the data from the following table: http://www.vigimeteo.com/PREV/obs/obs_seul.html?a=07005&b=
However my efforts thus far have failed.
I have tried the following
- Simple wget, which results in the html from the site, and some of the javascript functions used to populate the table, but I haven't been able to really look through it and find the parts that I could use to grab the data using some of R's JS utilities. It might be that my experience with JS is quite poor
- I tried the solution here Reading data from iframe, b/c it looked like the original website had the table in an iframe, but again no luck
A combination of getURL and readHTMLTable
thisURL = http://www.vigimeteo.com/PREV/obs/obs_seul.html?a=07005&b= theURL = getURL(thisURL,.opts = list(ssl.verifypeer = FALSE) ) tables = readHTMLTable(theURL)
This results in an empty table
- Spent about an hour going through every part of the html and javascript code I could find, but with limited success as detailed in 1.
It appears maybe R's Selenium package could have a potential solution, but I haven't yet figured out how to use it here, probably due to unfamiliarity
I feel like I'm just missing an essential part here... perhaps due to my lack of knowledge of JS and XML?
UPDATE:
I've noticed that if I right-click on the table element and use Chrome's "inspect" it generates HTML that has all of the table's values in it and would be very scrape-able... I'm still not sure how to get to this point in R though. Anyone have hints on where to look in the "inspect" screen to try and guide my progress?