You can download the file by setting a UserAgent as follows:
require(httr)
UA <- "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36"
my_url <- "http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML"
doc <- GET(my_url, user_agent(UA))
Now have a look at content(doc, "text")
to see that it is the file you see in the browser
Then you can parse it via XML
or xml2
. I find xml2
easier but that is just my taste. Both work.
data <- XML::xmlParse(content(doc, "text"))
data2 <- xml2::read_xml(content(doc, "text"))
Why do i have to use a user agent?
From the RCurl FAQ: http://www.omegahat.org/RCurl/FAQ.html
Why doesn't RCurl provide a default value for the useragent that some sites require?
This is a matter of philosophy. Firstly, libcurl doesn't specify a default value and it is a framework for others to build applications. Similarly, RCurl is a general framework for R programmers to create applications to make "Web" requests. Accordingly, we don't set the user agent either. We expect the R programmer to do this. R programmers using RCurl in an R package to make requests to a site should use the package name (and also the version of R) as the user agent and specify this in all requests.
Basically, we expect others to specify a meaningful value for useragent so that they identify themselves correctly.
Note that users (not recommended for programmers) can set the R option named RCurlOptions via R's option() function. The value should be a list of named curl options. This is used in each RCurl request merging these values with those specified in the call. This allows one to provide default values.
I suspect http://forecast.weather.gov/ to reject all requests without a UserAgent.
Access Denied
\n \nYou don't have permission to access \"http://forecast.weather.gov/MapClick.php?\" on this server.\nReference #18.8d070f17.1439733211.9c98310f\n\n\n"
– twroye Aug 16 '15 at 13:53