15

I have a netCDF file that I wish to extract a subset from defined by latitude/longitude boundaries (i.e. a lat/long defined box), using the ‘ncdf’ package in R.

A summary of my netCDF file is below. It has two dimensions (latitude and longitude) and 1 variable (10U_GDS4_SFC). It is essentially a lat/long grid containing wind values:

[1] "file example.nc has 2 dimensions:"
[1] "lat_0   Size: 1280"
[1] "lon_1   Size: 2560"
[1] "------------------------"
[1] "file example.nc has 1 variables:"
[1] "float 10U_GDS4_SFC[lon_1,lat_0]  Longname:10 metre U wind component Missval:1e+30"

The latitude variable runs from +90 to -90 and the longitude variable runs form 0 to 360.

I wish to extract a subset of the overall grid using the following geographical corner boundaries:

bottom left corner: Lat: 34.5˚, Long: 355˚, top left corner: Lat: 44.5˚, Long: 355˚, top right corner: Lat: 44.5˚, Long: 12˚, bottom right corner: Lat: 34.5˚ , Long: 12˚

I am aware that parts of a variable can be extracted using the get.var.ncdf() command (example below):

z1 = get.var.ncdf(example.nc, "10U_GDS4_SFC", start=c(11,26), count=c(5,5))

However, I can’t work out how lat/long can be incorporated so that I end up with a subsetted spatial grid containing variable values. I am new to working with netCDF values in R, and any advice would be greatly appreciated. Many thanks!

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
Emily
  • 859
  • 5
  • 14
  • 31

3 Answers3

9

In principle you are 2/3 of the way there. You can of course create the starting indices using something like this:

require(ncdf4)

ncFile <- nc_open( MyNetCDF )
LonStartIdx <- which( ncFile$dim$lon$vals == 355)
LatStartIdx <- which( ncFile$dim$lat$vals == 34.5)

Do the same for the counts. Then, read the variable you want

MyVariable <- ncvar_get( ncFile, varName, start=c( LonStartIdx, LatStartIdx), count=...)

However in your case you are out of luck as far as I know. The reading / writing netcdf routines do their stuff sequentially. Your grid wraps around since you have coordinates that go from 0 - 360 in longitude and you are interested in a box that contains the zero meridian.

For you (assuming you have not too much data) it would make more sense to read in the full grid into R, and then use either subset or create indices using which and cut out your "box" in R.

ncFile <- nc_open( MyNetCDF )
LonIdx <- which( ncFile$dim$lon$vals > 355 | ncFile$dim$lon$vals < 10)
LatIdx <- which( ncFile$dim$lat$vals > 34.5 & ncFile$dim$lat$vals < 44.5)
MyVariable <- ncvar_get( ncFile, varName)[ LonIdx, LatIdx]
nc_close(ncFile)

Remark: I prefer ncdf4, I find the syntax a bit easier to remember (and there was another advantage over the older netcdf R-package that I have forgotten...)

Ok. Comments cannot be as long as I would need them, so I updated the answer No worries. Let's go through the questions step by step.

  • The which function way will work. I use it myself.
  • The data will be in a similar format as in the netcf file, but I am not too sure if there is some problem with the 0 meridian (I guess yes). You might have to swap the two halves by doing something like this (replace the corresponding line in the 2nd example)

    LonIdx <- c(which( ncFile$dim$lon$vals > 355) , which( ncFile$dim$lon$vals < 10) )
    

    This changes the order of the coordinate indices so that the Western part comes first and then the Eastern.

  • Reformatting everything to a 2x3 data frame is possible. Take the data my 2nd code example returns (will be a matrix, [lon x lat]. Also get the values of the coordinates from

    lon <- ncFile$dim$lon$val[LonIdx]
    

    (or how longitude is called in your example, same for lat). Then assemble the matrix using

    cbind( rep(lat, each=length(lon)), rep(lon,length(lat)), c(myVariable) )
    
  • The coordinates will of course be the same as in the netcdf file...

You need to samity check the last cbind, as I am only about 98% confident that I have not messed up the coordinates. In the R scripts I found on my desktop I use loops, which are... evil... This should be (a bit?) faster and is also more sensible.

Joe W
  • 124
  • 6
  • @ Joe thanks for the response. Yes the 0-360 longitude is a problem. I need each variable value to have a lat/long reference so I don't think that your which option is going to work? Do you know a way of ending up with the box either in the same format at the original netCDF or in a ?x3 dataframe with the following columns: lat, long, variable. I also need the box to have the same spacing in coordinate values as the original netCDF as it is a grid. Any further suggestions would be much appreciated. Apologies for my limited coding knowledge. – Emily Jan 22 '14 at 12:22
  • I tried to put everything into a comment, but they cannot be long enough. So I updated the answer. Hope that helps! – Joe W Jan 23 '14 at 06:01
  • @ Joe Thanks for the update. I can see how this should work well. I'm having problems getting it to work with my file at the moment, but I think that this is probably related to the file rather than the code! It gives me a great place to start! Many thanks – Emily Jan 23 '14 at 11:10
  • Where are you stuck? Maybe I can clarify the answer a bit. Good luck! – Joe W Jan 23 '14 at 13:58
  • @ Joe: thanks! It appears to be a problem with latitude. The lat data runs from +90 to -90, and even after using the 'which' command with >34.5 and <44.5 an array with latitudes from +90 to -90 is returned (i.e. it doesn't seem to recognise the limits that I am trying to set). This is probably something to do with the negative numbers? – Emily Jan 23 '14 at 16:17
  • Oops, there is an error in the which() expression, fixed it. It should be (of course) & (and) not | (or) – Joe W Jan 24 '14 at 08:24
  • @ Joe: Brilliant thanks. That works great. My only remaining question is how do you know which lat and long values relate to each other, and to the associated variable? i.e. in the netCDF file, each lat/long pair has an associated variable value, but when you isolate the lat and then the long using your code how does R know which of the values relates to which? Is it just down to the ordering? Many thanks! – Emily Jan 26 '14 at 13:56
  • Yes, it is the ordering. I usually try out what happens with the ordering by creating a matrix A=matrix(1:30,5,6) and then doing a c(A). This helps me visualise what happens (as there are two different possibillities as I belive `C` does it differently... then maybe again it is Fortran that's different (I usually try it out). – Joe W Jan 26 '14 at 19:23
  • I am reading this post with great interest as I am trying to do the same, but while trying to subset I always get a RAM size issue (and I have 80GO on my computer). Any idea how I could do the same subsetting from a heavy .nc original file? I have tried various approaches using stars & tidync, it works but then I am stuck with a file that is not .nc and therefore not suitable for other manipulations... – Recology Jul 11 '22 at 13:46
  • There's still an error in the code above with LonIdx, which should be & not | – Thomas Moore Dec 21 '22 at 02:52
5

You can also use CDO to extract the area from the bash command line first and then read the file in R:

cdo sellonlatbox,-5,12,34.5,44.5 in.nc out.nc 

I note in the above discussion that there was a problem concerning the order of the latitudes. You can also use the CDO command "invertlat" to sort that out for you.

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
1

If you are using Linux this can be achieved easily using nctoolkit (https://nctoolkit.readthedocs.io/en/latest/):

import nctoolkit as nc
data = nc.open_data("example.nc")    
data.subset(lon = [-12, -5], lat = [35.4, 44.5])
Robert Wilson
  • 3,192
  • 11
  • 19