Loop through netcdf files and run calculations - Python or R

Question

This is my first time using netCDF and I'm trying to wrap my head around working with it.

I have multiple version 3 netcdf files (NOAA NARR air.2m daily averages for an entire year). Each file spans a year between 1979 - 2012. They are 349 x 277 grids with approximately a 32km resolution. Data was downloaded from here.

The dimension is time (hours since 1/1/1800) and my variable of interest is air. I need to calculate accumulated days with a temperature < 0. For example

    Day 1 = +4 degrees, accumulated days = 0
    Day 2 = -1 degrees, accumulated days = 1
    Day 3 = -2 degrees, accumulated days = 2
    Day 4 = -4 degrees, accumulated days = 3
    Day 5 = +2 degrees, accumulated days = 0
    Day 6 = -3 degrees, accumulated days = 1

I need to store this data in a new netcdf file. I am familiar with Python and somewhat with R. What is the best way to loop through each day, check the previous days value, and based on that, output a value to a new netcdf file with the exact same dimension and variable.... or perhaps just add another variable to the original netcdf file with the output I'm looking for.

Is it best to leave all the files separate or combine them? I combined them with ncrcat and it worked fine, but the file is 2.3gb.

Thanks for the input.

My current progress in python:

import numpy
import netCDF4
#Change my working DIR
f = netCDF4.Dataset('air7912.nc', 'r')
for a in f.variables:
  print(a)

#output =
     lat
     long
     x
     y
     Lambert_Conformal
     time
     time_bnds
     air

f.variables['air'][1, 1, 1]
#Output
     298.37473

To help me understand this better what type of data structure am I working with? Is ['air'] the key in the above example and [1,1,1] are also keys? to get the value of 298.37473. How can I then loop through [1,1,1]?

I know this is rather late for this thread from 2013, but I just wanted to point out that the accepted solution doesn't provide the solution to the question as posed. The question seems to want the length of each continuous period of temperatures below zero (note in the question the counter resets if the temperature exceeds zero), whereas this solution only gives the total number of days in a year that the temperature is below zero. This is not a subtle difference. If simply the total number of days is required, then the question should be edited to state this. — ClimateUnboxed, May 28 '17 at 08:16

Rich Signell · Accepted Answer · 2013-09-06T21:56:46.557

You can use the very nice MFDataset feature in netCDF4 to treat a bunch of files as one aggregated file, without the need to use ncrcat. So you code would look like this:

from pylab import *
import netCDF4

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')
# print variables
f.variables.keys()

atemp = f.variables['air']
print atemp

ntimes, ny, nx = shape(atemp)
cold_days = zeros((ny,nx),dtype=int)

for i in xrange(ntimes):
    cold_days += atemp[i,:,:].data-273.15 < 0

pcolormesh(cold_days)
colorbar()

generated image of cold days

And here's one way to write the file (there might be easier ways):

# create NetCDF file
nco = netCDF4.Dataset('/usgs/data2/notebook/cold_days.nc','w',clobber=True)
nco.createDimension('x',nx)
nco.createDimension('y',ny)

cold_days_v = nco.createVariable('cold_days', 'i4',  ( 'y', 'x'))
cold_days_v.units='days'
cold_days_v.long_name='total number of days below 0 degC'
cold_days_v.grid_mapping = 'Lambert_Conformal'

lono = nco.createVariable('lon','f4',('y','x'))
lato = nco.createVariable('lat','f4',('y','x'))
xo = nco.createVariable('x','f4',('x'))
yo = nco.createVariable('y','f4',('y'))
lco = nco.createVariable('Lambert_Conformal','i4')

# copy all the variable attributes from original file
for var in ['lon','lat','x','y','Lambert_Conformal']:
    for att in f.variables[var].ncattrs():
        setattr(nco.variables[var],att,getattr(f.variables[var],att))

# copy variable data for lon,lat,x and y
lono[:]=f.variables['lon'][:]
lato[:]=f.variables['lat'][:]
xo[:]=f.variables['x'][:]
yo[:]=f.variables['y'][:]

#  write the cold_days data
cold_days_v[:,:]=cold_days

# copy Global attributes from original file
for att in f.ncattrs():
    setattr(nco,att,getattr(f,att))

nco.Conventions='CF-1.6'
nco.close()

If I try looking at the resulting file in the Unidata NetCDF-Java Tools-UI GUI, it seems to be okay: enter image description here Also note that here I just downloaded two of the datasets for testing, so I used

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')

as an example. For all the data, you could use

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.????.nc')

or

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.*.nc')

Thank you sir! That's exactly what I was looking for and much more in-depth than I was expecting. You have saved me a great deal of time. I'm always impressed by the community. — mkmitchell, Sep 09 '13 at 15:31
I've mentioned this on another thread, but it is a bummer that MFDataset won't work for NetCDF4 in python, even with some restrictions. There are lots of good examples of MFDataset usage, and these are good for the many legacy file but not for the latest standard. — Eli S, Jun 15 '16 at 17:58
I put a comment above to state that this solution (while elegant and detailed) does not answer the question posed, as it provides the total days in a year below zero, and not the length of each continuous period below freezing, which can be important for agriculture for example. — ClimateUnboxed, May 28 '17 at 08:19

jsta · Answer 2 · 2016-04-11T17:27:41.940

4

Here is an R solution.

infiles <- list.files("data", pattern = "nc", full.names = TRUE, include.dirs = TRUE)

outfile <- "data/air.colddays.nc"     

library(raster)

r <- raster::stack(infiles) 
r <- sum((r - 273.15) < 0)

plot(r)

edited Apr 11 '16 at 17:27

answered Apr 07 '16 at 02:01

jsta

3,216
25
35

ClimateUnboxed · Answer 3 · 2017-05-28T21:32:48.933

I know this is rather late for this thread from 2013, but I just want to point out that the accepted solution doesn't provide the solution to the exact question posed. The question seems to want the length of each continuous period of temperatures below zero (note in the question the counter resets if the temperature exceeds zero), which can be important for climate applications (e.g. for farming) whereas the accepted solution only gives the total number of days in a year that the temperature is below zero. If this is really what mkmitchell wants (it has been accepted as the answer) then it can be done in from the command line in cdo without having to worry about NETCDF input/output:

 cdo timsum -lec,273.15 in.nc out.nc

so a looped script would be:

files=`ls *.nc` # pick up all the netcdf files in a directory
for file in $files ; do
    # I use 273.15 as from the question seems T is in Kelvin 
    cdo timsum -lec,273.15 $file ${file%???}_numdays.nc
done

If you then want the total number over the whole period you can then cat the _numdays files instead which are much smaller:

cdo cat *_numdays.nc total.nc 
cdo timsum total.nc total_below_zero.nc

But again, the question seems to want accumulated days per event, which is different, but not provided by the accepted answer.

Loop through netcdf files and run calculations - Python or R

3 Answers3

Linked