Read multidimensional NetCDF as data frame in R

Question

I use a netCDF file which stores one variable and has following dimensions: lon, lat, time. Generally speaking I wish to compare it against different data that I have already in R stored as dataframe - first two columns are coordinates in WGS84, and next are values for specific time.

So I wrote following code.

# since # ncFile$dim$time$units  say:  [1] "days since 1900-1-1"
daysFromDate <- function(data1, data2="1900-01-01")
{
  round(as.numeric(difftime(data1,data2,units = "days")))
}

#study area:
lon <- c(40.25, 48)
lat <- c(16, 24.25)
myTime <- c(daysFromDate("2008-01-16"), daysFromDate("2011-12-31"))
varName <- "spei"



require(ncdf4)
require(RCurl)
x <- getBinaryURL("http://digital.csic.es/bitstream/10261/104742/3/SPEI_01.nc")
ncFile <- nc_open(x)

LonIdx <- which( ncFile$dim$lon$vals >= lon[1] | ncFile$dim$lon$vals <= lon[2])
LatIdx <- which( ncFile$dim$lat$vals >= lat[1] & ncFile$dim$lat$vals <= lat[2])
TimeIdx <- which( ncFile$dim$time$vals >= myTime[1] & ncFile$dim$time$vals <= myTime[2])
MyVariable <- ncvar_get( ncFile, varName)[ LonIdx, LatIdx, TimeIdx]

I thought that data frame will be returned so that I will be able to easily manipulate data (in example - check correlation or create a plot). Unfortunately 3-dimensional list has been returned instead. How can I reformat this to data frame with following columns X-Y-Time1-Time2-...

So, example data will looks as follows

X Y 2014-01-01 2014-01-02 2014-01-02

50 17 0.5 0.4 0.3

where 0.5, 0.4 and 0.3 are example variable values

Or maybe there is different solution?

@AF7 What advantages I will gain by using raster package? Will it be easier/better? — matandked, Jan 25 '16 at 04:55
I find it much easier. It is very intuitive, and many functions for layer correlation, cropping, cross-layer maths, random-forest, plotting, remapping,merging and much more are already implemented. — AF7, Jan 26 '16 at 08:18

Severin Pappadeux · Accepted Answer · 2016-01-28T05:36:49.023

Ok, try following code, but it assumes that ranges are dense filled. And I changed lon test from or to and

require(ncdf4)

nc <- nc_open("SPEI_01.nc")
print(nc)

lon <- ncvar_get(nc, "lon")
lat <- ncvar_get(nc, "lat")
time <- ncvar_get(nc, "time")


lonIdx <- which( lon >= 40.25 & lon <= 48.00)
latIdx <- which( lat >= 16.00 & lat <= 24.25)

myTime <- c(daysFromDate("2008-01-16"), daysFromDate("2011-12-31"))
timeIdx <- which(time >= myTime[1] & time <= myTime[2])

data <- ncvar_get(nc, "spei")[lonIdx, latIdx, timeIdx]

indices <- expand.grid(lon[lonIdx], lat[latIdx], time[timeIdx])
print(length(indices))
class(indices)
summary(indices)
str(indices)

df <- data.frame(cbind(indices, as.vector(data)))
summary(df)
str(df)

UPDATE

ok, looks like I got the idea what do you want, but have do direct solution. What I've got so far is this: split data frame using either split() function or data.table package. After splitting by X&Y, you'll get lists of small data frames where X&Y are a constant for a given frame. Probably is it possible to transpose and recombine them back, but I have no idea how. It might be a good idea to continue to work with data as columns, Lists are nested, could be flattened, and here is link for splitting in R: http://www.uni-kiel.de/psychologie/rexrepos/posts/dfSplitMerge.html

Code, as continued from previous example

require(data.table)

colnames(df) <- c("X","Y","Time","spei")
df$Time <- as.Date(df$Time, origin="1900-01-01")

dt <- as.data.table(df)
summary(dt)

# Taken from https://github.com/Rdatatable/data.table/issues/1389
# x data.table
# f use `by` argument instead - unlike data.frame
# drop logical default FALSE will include `by` columns in resulting data.tables - unlike data.frame
# by character column names on which split into lists
# flatten logical default FALSE will result in recursive nested list having data.table as leafs
# ... ignored
split.data.table <- function(x, f, drop = FALSE, by, flatten = FALSE, ...){
    if(missing(by) && !missing(f)) by = f
    stopifnot(!missing(by), is.character(by), is.logical(drop), is.logical(flatten), !".ll" %in% names(x), by %in% names(x), !"nm" %in% by)
    if(!flatten){
        .by = by[1L]
        tmp = x[, list(.ll=list(.SD)), by = .by, .SDcols = if(drop) setdiff(names(x), .by) else names(x)]
        setattr(ll <- tmp$.ll, "names", tmp[[.by]])
        if(length(by) > 1L) return(lapply(ll, split.data.table, drop = drop, by = by[-1L])) else return(ll)
    } else {
        tmp = x[, list(.ll=list(.SD)), by=by, .SDcols = if(drop) setdiff(names(x), by) else names(x)]
        setattr(ll <- tmp$.ll, 'names', tmp[, .(nm = paste(.SD, collapse = ".")), by = by, .SDcols = by]$nm)
        return(ll)
    }
}

# here is data.table split
q <- split.data.table(dt, by = c("X","Y"), drop=FALSE)
str(q)

# here is data frame split
qq <- split(df, list(df$X, df$Y))
str(qq)

"ranges are dense filled" ---> I am not sure what do you mean, could you explain, please? — matandked, Jan 24 '16 at 09:58
@matandked means in the range [min...max) condition will produce only one "continuous" set of points. Basically, you have to be sure "expand.grid" would match vector of values. BTW, did it work for you? — Severin Pappadeux, Jan 24 '16 at 19:02
Yes, this works! However the results is still not what I needed - it produce x-y-time-value data frame and I wish to have x-y-date1value-date2value etc — matandked, Jan 25 '16 at 04:51
I would add following lines to your solution to display column names and dates: colnames(df) <- c("X","Y","Time","spei") df$Time <- as.Date(df$Time,origin="1900-01-01") — matandked, Jan 25 '16 at 04:52
@matandked Yeah, skipped making column names. Wrt `wish to have x-y-date1value-date2value` it was a bit unclear what you wanted. So I concentrated on making data frame without ANY information loss (you'll get everything in netCDF, down to single bit), so further transformation could be done with just R code. It is still a bit unclear what would be date1, date2 values. Do you want to group by a date? If you clarify the question, or ask another question, I would be glad to help — Severin Pappadeux, Jan 26 '16 at 16:36
Sure, instead X Y Time Value 50 17 2014-01-01 0.5 50 17 2014-01-02 0.4 50 17 2014-01-03 0.3 I would like to have X Y 2014-01-01 2014-01-02 2014-01-03 50 17 0.5 0.4 0.3 — matandked, Jan 27 '16 at 12:00
Thank you for response, but I have no idea how to merge mentioned data quickly. I thought about writing a loop for combining them, but guess there is a better solution with apply/sapply. I will ask another question on Stackoverflow. — matandked, Jan 28 '16 at 10:53
@matandked yeah, looks like then for each mini-frame you have to transpose it and then flatten into single row, and last do `rbind()` to make final df. Please ask, I would interested in the answer(s) myself. — Severin Pappadeux, Jan 28 '16 at 18:41
http://stackoverflow.com/questions/35060764/easy-way-to-transform-data-frame-in-r-one-variable-values-as-separate-columns — matandked, Jan 28 '16 at 21:08

Read multidimensional NetCDF as data frame in R

1 Answers1