Assuming you know the URLs for each of the datasets a similar question can be found here:
Download a file from HTTPS using download.file()
For this it becomes:
library(RCurl)
URL <- "http://www.statcan.gc.ca/cgi-bin/sum-som/fl/cstsaveascsv.cgi?filename=labr71a-eng.htm&lan=eng"
x <- getURL(URL)
URLout <- read.csv(textConnection(x),row.names=NULL)
I obtained the URL by right-clicking the access button and copying the address.
I had to declare row.names=NULL
as the number of columns in the first row is not equal to the number of columns elsewhere, thus read.csv
assumes row names as described here. I'm not sure if the URL to these datasets would change when they are updated, but this isn't a really convenient way to get this data. The JSON doesn't seem to be much better for intuitively being able to change datasets.
At least this way you could create a list of URLs and perform the following:
URL <- list(getURL("http://www.statcan.gc.ca/cgi-bin/sum-som/fl/cstsaveascsv.cgi?filename=labr71a-eng.htm&lan=eng"),
getURL("http://www.statcan.gc.ca/cgi-bin/sum-som/fl/cstsaveascsv.cgi?filename=labr72-eng.htm&lan=eng"))
URLout <- lapply(URL,function(x) read.csv(textConnection(x),row.names=NULL,skip=2))
Again I don't like having to declare row.names=NULL
and when I look at the file I'm not seeing the discrepant number of columns, however this will at least get the file into the R environment for you. It may take some more work to perform the operation over multiple URLs.
In a further effort to obtain useful colnames
:
URL <- "http://www.statcan.gc.ca/cgi-bin/sum-som/fl/cstsaveascsv.cgi?filename=labr71a-eng.htm&lan=eng"
x <- getURL(URL)
URLout <- read.csv(textConnection(x),row.names=NULL, skip=2)
The arguement skip = 2
will skip the first 2 rows when reading in the CSV, and will yield some header names. Because the headers are numbers an X will be placed in front. Row 2 in this case will have the value "number" in the second column. Unfortunately it appears this data was intended for use within excel, which is really sad.