9

For the following netcdf file with daily global sea surface temperatures for 2016, I'm trying to (i) subset temporally, (ii) subset geographically, (iii) then take long-term means for each pixel and create a basic plot.

Link to file: here

library(raster)
library(ncdf4)

open the netcdf after setting my working directory

nc_data <- nc_open('sst.day.mean.2016.v2.nc')

change the time variable so it's easy to interpret

time <- ncdf4::ncvar_get(nc_data, varid="time")
head(time)

change to dates that I can interpret

time_d <- as.Date(time, format="%j", origin=as.Date("1800-01-01"))

Now I'd like to subset only September 1 to October 15, but can't figure that out...

Following temporal subset, create raster brick (or stack) and geographical subset

b <- brick('sst.day.mean.2016.v2.nc') # I would change this name to my file with time subest

subset geographically

b <- crop(b, extent(144, 146, 14, 16))

Finally, I'd like to take the average for each pixel across all my days of data, assign this to a single raster, and make a simple plot...

Thanks for any help and guidance.

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
Peter Houk
  • 107
  • 1
  • 2
  • 6

2 Answers2

12

After b <- brick('sst.day.mean.2016.v2.nc'), we can type b to see information of the raster brick.

b
# class       : RasterBrick 
# dimensions  : 720, 1440, 1036800, 366  (nrow, ncol, ncell, nlayers)
# resolution  : 0.25, 0.25  (x, y)
# extent      : 0, 360, -90, 90  (xmin, xmax, ymin, ymax)
# coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
# data source : C:\Users\basaw\Downloads\sst.day.mean.2016.v2.nc 
# names       : X2016.01.01, X2016.01.02, X2016.01.03, X2016.01.04, X2016.01.05, X2016.01.06, X2016.01.07, X2016.01.08, X2016.01.09, X2016.01.10, X2016.01.11, X2016.01.12, X2016.01.13, X2016.01.14, X2016.01.15, ... 
# Date        : 2016-01-01, 2016-12-31 (min, max)
# varname     : sst 

Notice that the Date slot has information from 2016-01-01 to 2016-12-31, which means the Z values already has date information and we can use that to subset the raster brick.

We can use the getZ function to access the values stored in the Z values. Type getZ(b) we can see a series of dates.

head(getZ(b))
# [1] "2016-01-01" "2016-01-02" "2016-01-03" "2016-01-04" "2016-01-05" "2016-01-06"

class(getZ(b))
# [1] "Date"

We can thus use the following code to subset the raster brick.

b2 <- b[[which(getZ(b) >= as.Date("2016-09-01") & getZ(b) <= as.Date("2016-10-15"))]]

We can then crop the image based on the code you provided.

b3 <- crop(b2, extent(144, 146, 14, 16))

To calculate the average, just use the mean function.

b4 <- mean(b3, na.rm = TRUE)

Finally, we can plot the average.

plot(b4)

enter image description here

www
  • 38,575
  • 12
  • 48
  • 84
4

The subsetting and averaging task is easy to do in CDO:

cdo timmean -sellonlatbox,lon1,lon2,lat1,lat2 -seldate,date1,date2 in.nc out.nc

where the lon1,lon2 etc define the lon-lat area to cut out and date1,date2 are the date bounds.

You can call this command directly from R using the climate operators package as per this question.

So for example, without the piping, on 3 lines would be in R:

cdo("seldate,date1,date2",in.fname,out1.fname,debug=TRUE)
cdo("sellonlatbox,lon1,lon2,lat1,lat", out1.fname,out2.fname,debug=TRUE) 
cdo("timmean",out2.fname,out.fname,debug=TRUE) 
ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
  • 2
    some files like in CMIP5 climate models (https://esgf-node.llnl.gov/search/cmip5/) there is some time variables like average_T1 and average_T2. When I run your example I got the messages below. Is there a way to force the time variable? Is there a way to select the variable in this code? Warning (find_time_vars): Found more than one time variable, skipped variable average_T1! Warning (find_time_vars): Found more than one time variable, skipped variable average_T2! 60 cdo ntime: Processed 2 variables over 60 timesteps. – fvfaleiro Sep 04 '19 at 02:39
  • @fvfaleiro sorry just saw this now. CDO can struggle when more than one time variable is present, this is also a problem for the S2S database, and I usually try to construct my retrievals to avoid this or alternatively use the gribex or eccodes from ECWMF for manipulation in these cases. Did CDO strip out one of the time variables in this case, or did the operation work despite the warning? – ClimateUnboxed Jan 09 '20 at 14:03
  • Adrian Tompkins the operation was executed despite the warnings. The output make sense, but I am not sure if it is correct. – fvfaleiro Jan 16 '20 at 14:38
  • did the output have the same dimensions though, or did you lose one time axis? what do you see with ncdump -h? – ClimateUnboxed Jan 17 '20 at 10:51
  • The dimensions are identical, but as expected the time changed from time = UNLIMITED ; // (240 currently) to time = UNLIMITED ; // (1 currently). My main concern is about the values in each cell that I am not sure if the calculation was performed correctly with these warnings. This is the firsts lines of the ncdum -h output: netcdf pr_mean_204101-206012 { dimensions: time = UNLIMITED ; // (1 currently) bnds = 2 ; lon = 144 ; lat = 90; variables: double time(time) ; time:standard_name = "time" ;time:long_name = "time" ; time:bounds = "time_bnds"; time:units = "days since 2006-01-01 00:00:00" – fvfaleiro Jan 20 '20 at 15:52
  • I think from the error message and your reply here things have worked fine. It has averaged over the time variable and ignored the other two. The only thing that is likely "incorrect" in the new file is that the two variables average_T1 and average_T2 are likely no longer correct. I am not familiar with those in CMIP but presume they are the average timestep over a certain period. CDO hasn't touched those and simply copied them to the new file, so they are not valid in the new averaged file. If you only use "time" in your analysis of this file and ignore those two variables you are okay. – ClimateUnboxed Jan 21 '20 at 09:23