4

Following a previous question (Faster reading of time series from netCDF?) I have re-permuted my netCDF files to provide fast time-series reads (scripts on github to be cleaned up eventually ...).

In short, to make reads faster, I have rearranged the dimensions from lat, lon, time to time, lat, lon. Now, my existing scripts break because they assume that the dimensions will always be lat, lon, time, following the ncdf4 documentation of ncvar_get, for the 'start' argument:

Order is X-Y-Z-T (i.e., the time dimension is last)

However, this is not the case.

Furthermore, there is a related inconsistency in the order of variables listed via the commandline netCDF utility ncdump -h and the R function ncdf4::nc_open. The first says that the dimensions are in the expected (lat, lon, time) order while the latter sees dimensions with time first (time, lat, lon).

For a minimal example, download the file test.nc and run

bash-$ ncdump -h .nc
bash-$ R
R> library(ncdf4)
R> print(nc_open("test.nc")

What I want to do is get records 5-15 from the variable "lwdown"

my.nc <- nc_open("test.nc")

But this doesn't work, since R sees the time dimension first, so I must change my scripts to

ncvar_get(my.nc, "lwdown", start = c(5, 1, 1), count = c(10, 1, 1))

It wouldn't be so bad to update my scripts and functions, except that I want to be able to read data from files regardless of the dimension order.

Other than Is there a way to generalize this function so that it works independent of dimension order?

Community
  • 1
  • 1
David LeBauer
  • 31,011
  • 31
  • 115
  • 189

2 Answers2

3

While asking the question, I figured out this solution, though there is still room for improvement:

The closest I can get is to open the file and find the order in this way:

my.nc$var$lwdown$dim[[1]]$name
[1] "time"
my.nc$var$lwdown$dim[[2]]$name
[1] "lon"
my.nc$var$lwdown$dim[[3]]$name
[1] "lat"

which is a bit unsatisfying, although it led me to this solution:

If I want to start at c(lat = 1, lon = 1, time = 5), but the ncvar_get expects an arbitrary order, I can say"

start <- c(lat = 1, lon = 1, time = 5)
count <- c(lat = 1, lon = 1, time = 10)
dim.order <- sapply(my.nc$var$lwdown$dim, function(x) x$name)

ncvar_get(my.nc, "lwdown", start = start[dim.order], count = count[dim.order])
David LeBauer
  • 31,011
  • 31
  • 115
  • 189
1

I ran into this recently as well. I have a netcdf with data in this format

nc_in <- nc_open("my.nc")

nc_in$dim[[1]]$name == "time"
nc_in$dim[[2]]$name == "latitude"
nc_in$dim[[3]]$name == "longitude"

nc_in$dim[[1]]$len == 3653 # this is the number of timesteps in my netcdf
nc_in$dim[[2]]$len == 180 # this is the number of longitude cells
nc_in$dim[[3]]$len == 360 # this is the number of latitude cells

The obnoxious part here is that the DIM component of the netCDF is in the order of T,Y,X

If I try to to grab time series data for the pr var using the indices in the order they appear in nc_in$dim I get an error

ncvar_get(nc_in,"pr")[3653,180,360] # 'subscript out of bounds'

If I instead grab data in X,Y,T order, it works:

ncvar_get(nc_in,"pr")[360,180,3653] # gives me a value

What I don't understand is how the ncvar_get() package knows what variable represents X, Y and T, especially if you have generated your own netCDF.

Gigamosh57
  • 141
  • 3