0

warning: Newbe here. I would appreciate some guidance. I am trying to do the investment to learn how to use R for automatizing downloads.

What I need: To download data on shale gas wells from this website for all counties and reporting periods: https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCounty.aspx (Note that agreement might be asked when entering, not a big deal)

I can get to the page that lists all the CSV files I want to download. Unfortunately the site has the same address as above. (You can try to choose a county and a reporting period and see for yourself)

However once in that page, the links that activate the CSV downloads are listed. For each of them is something like this: https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY

What I have tried:

library(downloader)

download ("https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY",
          destfile="Prod_AUG15_Allegheny.csv")

I have followed what another person did here: Download documents from aspx web page in R

The problem: This command saves the website instead of the csv file.

trying URL 'https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY'
Content type 'text/html; charset=utf-8' length 11592 bytes (11 Kb)
opened URL
downloaded 11 Kb

The question: Is it related with my page being a https instead of http? Any guidance on how to solve it or other posts that are relevant? (I could find some posts on aspx downloads but nothing helpful)

Thanks in advance

Community
  • 1
  • 1
Pladiona
  • 93
  • 8
  • 1
    It's using SharePoint and is tracking both session info and "view state" info (there's a special place in hades for microsoft web ppl). You'll need to use selenium and use "clicks" to automate data downloads. – hrbrmstr Nov 11 '15 at 17:16
  • Thanks hrbrmstr! I am checking this option, but I would prefer to try with R, since I am doing the investment. However, I understand now that I have to give the View state info and session to R when it opens the URL. Will look for info about that. Any other comments welcome! – Pladiona Nov 12 '15 at 16:22
  • oh you can still do it in R https://cran.rstudio.org/web/packages/RSelenium/vignettes/RSelenium-basics.html – hrbrmstr Nov 12 '15 at 16:23
  • Oh Telepathy here. I'll look for that, thanks! – Pladiona Nov 12 '15 at 16:24

1 Answers1

2

@hrbrmstr It worked! Not the way I wanted at the beggining but with RSelenium I could click the button for accepting the agreement and actually open the download link.

Here is the code (Is simple but took me all day to find out, what a shame):

# Using RSelenium to save file
##Installing the package if needed
install.packages("RSelenium")
##Activating 
library("RSelenium")
checkForServer()
startServer()
#I had to start the server manually!
remDr <- remoteDriver()
remDr
remDr$open()
#open website and accepting conditions
remDr$navigate("https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Welcome/Agreement.aspx")
AgreeButton<-remDr$findElement(using = 'id', value="MainContent_AgreeButton")
AgreeButton$highlightElement()
AgreeButton$clickElement()

remDr$navigate("https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Production/ProductionByCountyExport.aspx?UNCONVENTIONAL_ONLY=false&INC_HOME_USE_WELLS=true&INC_NON_PRODUCING_WELLS=true&PERIOD=15AUGU&COUNTY=ALLEGHENY")

However!! I am not able to save the csv file :-(. I know I need a command for "Save link as..." But I am asking this in another topic related to RSelenium.

Will Edit the answer when I find out!

Pladiona
  • 93
  • 8