4

I am trying to download weather data, similar to the question asked here: How to parse XML to R data frame but when I run the first line in the example, I get "Error: 1: failed to load HTTP resource". I've checked that the URL is valid. Here is the line I'm referring to:

data <- xmlParse("http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML")

I've managed to find a work around with the following, but would like to understand why the first line didn't work.

testfile <- "G:/Self Improvement/R Working Directory/test.xml"
url <- "http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML"
download.file(url, testfile, mode="wb") # get data into test
data <- xmlParse(testfile)

Appreciate any insights.

Community
  • 1
  • 1
twroye
  • 43
  • 1
  • 4
  • magari è necessario che il file sia presente sul filesystem. – IgnazioC Aug 14 '15 at 22:39
  • 2
    When that happens, try `RCurl::getURL()` to get the html at text first. `txt <- RCurl::getURL(url)` then `data <- xmlParse(txt)` It's actually safer to just do it that way every time – Rich Scriven Aug 14 '15 at 22:51
  • Unfortunately that did not work, the code returns an HTML message of Access Denied. Here is the full message returned "\nAccess Denied\n\n

    Access Denied

    \n \nYou don't have permission to access \"http://forecast.weather.gov/MapClick.php?\" on this server.

    \nReference #18.8d070f17.1439733211.9c98310f\n\n\n"

    – twroye Aug 16 '15 at 13:53

2 Answers2

2

You can download the file by setting a UserAgent as follows:

require(httr)
UA <- "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36"
my_url <- "http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML"
doc <- GET(my_url, user_agent(UA))

Now have a look at content(doc, "text") to see that it is the file you see in the browser

Then you can parse it via XML or xml2. I find xml2 easier but that is just my taste. Both work.

data <- XML::xmlParse(content(doc, "text"))
data2 <- xml2::read_xml(content(doc, "text"))

Why do i have to use a user agent?
From the RCurl FAQ: http://www.omegahat.org/RCurl/FAQ.html

Why doesn't RCurl provide a default value for the useragent that some sites require?
This is a matter of philosophy. Firstly, libcurl doesn't specify a default value and it is a framework for others to build applications. Similarly, RCurl is a general framework for R programmers to create applications to make "Web" requests. Accordingly, we don't set the user agent either. We expect the R programmer to do this. R programmers using RCurl in an R package to make requests to a site should use the package name (and also the version of R) as the user agent and specify this in all requests.
Basically, we expect others to specify a meaningful value for useragent so that they identify themselves correctly.

Note that users (not recommended for programmers) can set the R option named RCurlOptions via R's option() function. The value should be a list of named curl options. This is used in each RCurl request merging these values with those specified in the call. This allows one to provide default values.

I suspect http://forecast.weather.gov/ to reject all requests without a UserAgent.

Rentrop
  • 20,979
  • 10
  • 72
  • 100
  • So should we assume setting the UserAgent in `RCurlOptions` allow `xmlParse` to query the url directly ? – Tensibai Jan 13 '16 at 09:30
  • I guess so. Try it out. – Rentrop Jan 13 '16 at 09:31
  • 1
    I think you're mistaking me as the question OP, I'm not. I asked out of curiosity to know if this would solve the problem so future readers can have a full solution and not just a hint. – Tensibai Jan 14 '16 at 11:10
  • That worked, thank you very much. Another solution I found was to use curl_fetch_memory() from the curl package. require(curl) my_url <- "http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML" curl_data <- curl_fetch_memory(url = my_url) data <- rawToChar(curl_data$content) – twroye Mar 16 '17 at 20:01
-1

I downloaded this url to a text file. After that, I get the content of the file and parse it to XML data. Here is my code:

rm(list=ls())
require(XML)
require(xml2)
require(httr)

url <- "http://forecast.weather.gov/MapClick.php?lat=29.803&lon=-82.411&FcstType=digitalDWML"

download.file(url=url,"url.txt" )
xmlParse(url)
data <- xmlParse("url.txt")

xml_data <- xmlToList(data)

location <- as.list(xml_data[["data"]][["location"]][["point"]])

start_time <- unlist(xml_data[["data"]][["time-layout"]][
    names(xml_data[["data"]][["time-layout"]]) == "start-valid-time"])
Tunaki
  • 132,869
  • 46
  • 340
  • 423
  • temps <- xml_data[["data"]][["parameters"]] temps <- temps[names(temps) == "temperature"] temps <- temps[sapply(temps, function(x) any(unlist(x) == "hourly"))] temps <- unlist(temps[[1]][sapply(temps, names) == "value"]) – Duc Thinh Truong Jan 13 '16 at 09:13
  • out <- data.frame( as.list(location), "start_valid_time" = start_time, "hourly_temperature" = temps) head(out) – Duc Thinh Truong Jan 13 '16 at 09:13
  • 1
    Edit your answer with your comments and format it properly. I've already formatted your current post, try to do the same. – Tunaki Jan 13 '16 at 09:14
  • And this is just exactly what is in the Q, and thus absolutely not answering the question of why xmlParse don't work on the url. – Tensibai Jan 13 '16 at 09:17
  • This was the workaround posted in my original question. – twroye Jan 13 '16 at 23:43