0

With @SchaunW help, I was able to figure out How to parse XML to R data frame

But in my data, I need to parse more than one XML data, my code is as follow, the codes runs good for the first couple stations, but if run for the entire 500 stations, the error pop out:

 "Error in temps.i[sapply(temps.i, function(x) any(unlist(x) == "hourly"))] : 
  invalid subscript type 'list'":

Please help, thanks!

data.all = data.frame() 
lat = data.0$lat 
lon = data.0$lon 
head(data.0)
station_id  LocID   lat    lon
   10001    11694  32.82  -86.65
   10079   089214  27.65  -80.23 (node 'temperature' not exit in XML)

data.loop <- lapply(1:length(data.0$station_id), function(i) {
urls.i <- paste("http://forecast.weather.gov/MapClick.php?lat=",lat[i],"&lon=",lon[i],"&FcstType=digitalDWML",sep="")
data.i <- xmlParse(urls.i)
xml_data.i <- xmlToList(data.i)
location.i <- as.list(xml_data.i[["data"]][["location"]][["point"]])
start_time.i <- unlist(xml_data.i[["data"]][["time-layout"]][names(xml_data.i[["data"]][["time-layout"]]) == "start-valid-time"])
temps.i <- xml_data.i[["data"]][["parameters"]]
temps.i <- temps.i[names(temps.i) == "temperature"]
temps.i <- temps.i[sapply(temps.i, function(x) any(unlist(x) == "hourly"))]
temps.i <- unlist(temps.i[[1]][sapply(temps.i, names) == "value"])
data1.i <- data.frame(as.list(location.i), "hh" = start_time.i, "Temp" = temps.i)
 })

data.all <- as.data.frame(do.call(rbind, data.loop))
Community
  • 1
  • 1
Rosa
  • 1,793
  • 5
  • 18
  • 23
  • You should provide more data to reproduce your error. More generally, to debug R code `options(error=recover)` [followed by `options(error = NULL)` to revert it] , is your friend. – agstudy Jun 24 '13 at 20:24
  • Thanks, I found the error: it was in some of the XML, the node "temperature" does not exist, I searched around but can not find how to deal with nodes not exist in XML in R, I'll edit the post, any suggestion would be great – Rosa Jun 25 '13 at 18:28

1 Answers1

0

I tried to reformat to simplify your code and add a test if there are temperature vector or not:

data.0 <- read.table(text='station_id  LocID   lat    lon
                 10001    11694  32.82  -86.65
                 10079   089214  27.65  -80.23',header=TRUE)
library(XML)

res <- apply(data.0,1, function(row) {
  tryCatch({
  url <- paste("http://forecast.weather.gov/MapClick.php?lat=",
               row['lat'],"&lon=",
               row['lon'],"&FcstType=digitalDWML",sep="")
    doc <- xmlParse(url)
    data <- xmlToList(doc)$data
    location <- data$location$point
    start_valid_time <- data$`time-layout`$`start-valid-time`
    if( "temperature" %in% names(data$parameters)){
      templ <- data$parameters$temperature
      temps <- as.numeric(unlist(lapply(seq_along(templ),
                                        function(x)templ[x]$value)))
      }else
        temps <- NA
        data.frame(as.list(location), hh = start_valid_time, Temps = temps)
    },error = function(e)data.frame(row['lat'],row['lon'],temps = NA))

  })

do.call(rbind,res)
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • Hi, agstudy, this first time the code runs good, but start the second time the error appears:"Document is empty,Start tag expected, '<' not found", do you have any idea? – Rosa Jun 27 '13 at 17:53
  • @Rosa Do you mean that if I try again the code above I will get an error? Or If I run it twice? – agstudy Jun 27 '13 at 19:18
  • yeah, I tried with only 2 rows, it worked fine, but when I run the entire 500 rows, the error appear again, and it took about 10 mintues... – Rosa Jun 27 '13 at 19:59