I'm trying to automatically download documents for Oil & Gas wells from the Colorado Oil and Gas Conservation Commission (COGCC) using the "rvest" and "downloader" packages in R.
The link to the table/form that contains the documents for a particular well is; http://ogccweblink.state.co.us/results.aspx?id=12337064
The "id=12337064" is the unique identifier for the well
The documents on the form page can be downloaded by clicking them. An example is below. http://ogccweblink.state.co.us/DownloadDocument.aspx?DocumentId=3172781
The "DocumentID=3172781" is the unique document ID for the document to be downloaded. In this case, an xlsm file. Other file formats on the document page include PDF and xls.
So far I've been able to write a code to download any document for any well but it is limited to only the first page. Majority of the wells have documents on multiple pages and I'm unable to download documents on pages other than page 1 (all document pages have similar URL)
## Extract the document id for document to be downloaded in this case "DIRECTIONAL DATA". Used the SelectorGadget tool to extract the CSS path
library(rvest)
html <- html("http://ogccweblink.state.co.us/results.aspx?id=12337064")
File <- html_nodes(html, "tr:nth-child(24) td:nth-child(4) a")
File <- as(File[[1]],'character')
DocId<-gsub('[^0-9]','',File)
DocId
[1] "3172781"
## To download the document, I use the downloader package
library(downloader)
linkDocId<-paste('http://ogccweblink.state.co.us/DownloadDocument.aspx DocumentId=',DocId,sep='')
download(linkDocId,"DIRECTIONAL DATA" ,mode='wb')
trying URL 'http://ogccweblink.state.co.us/DownloadDocument.aspx?DocumentId=3172781'
Content type 'application/octet-stream' length 33800 bytes (33 KB)
downloaded 33 KB
Does anyone know how I can modify my code to download documents on other pages?
Many thanks!
Em