Efficient netCDF analysis when looping through data

Question

This is a follow up question related to this question.

Thanks to previous help I have successfully imported a netCDF file (or files with MFDataset) and am able to compare the different times to one another to create another cumulative dataset. Here is a piece of the current code.

from numpy import *
import netCDF4
import os

f = netCDF4.MFDataset('air.2m.1979.nc')

atemp = f.variables['air']

ntimes, ny, nx = atemp.shape
cold_days = zeros((ntimes, ny, nx), dtype=int)

for i in range(ntimes):
  for b in range(ny):
    for c in range(nx):
      if i == 1:
          if atemp[i,b,c] < 0:
            cold_days[i,b,c] = 1
          else:
            cold_days[i,b,c] = 0
      else:
        if atemp[i,b,c] < 0:
          cold_days[i,b,c] = cold_days[i-1,b,c] + 1
        else:
          cold_days[i,b,c] = 0

This seems like a brute force way to get the job done, and though it works it takes a very long time. I'm not sure if it takes such a long time because I'm dealing with 365 349x277 matrices (35,285,645 pixels) or if my old school brute force way is simply slow in comparison to some built in python methods.

Below is an example of what I believe the code is doing. It looks at Time and increments cold days if temp < 0. If temp >= 0 than cold days resets to 0. In the below image you will see that the cell at row 2, column 1 increments each Time that passes but the cell at row 2, column 2 increments at Time 1 but resets to zero on Time 2.

Is there a more efficient way to rip through this netCDF dataset to perform this type of operation? Here is an example image

Why did you decide to loop over x and y? The answer http://stackoverflow.com/questions/18665078/loop-through-netcdf-files-and-run-calculations-python-or-r does not loop over x and y, and will be much faster. — Rich Signell, Sep 11 '13 at 14:40
for i in xrange(ntimes): cold_days += atemp[i,:,:].data-273.15 < 0 provided a single frame in time matrix with a cumulative cold days throughout the year. It didn't compare to the previous time frame and adjust accordingly. I'll try replacing x and y with : — mkmitchell, Sep 11 '13 at 15:24
I receive an error. I also tried atemp[i,:,:].data. if atemp[i,:,:] < 273.15: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() — mkmitchell, Sep 11 '13 at 15:33
I guess I still don't know what you are trying to do. You don't want the total number of days with mean temperatures less than zero? — Rich Signell, Sep 12 '13 at 00:47
I want to count the number of days with mean temperature less than zero, but in daily increments. At day 1 if the temp < 0 that day should be 1. If the next day is < 0 that day is 2, Day 1 should still be 1. However, when a day is >=0 that count should reset to 0. So if day 3 is +3 than that day should be 0. Therefore Day 1 would be 1, day 2 would be 2, and day 3 would be 0. Then if day 4 is < 0 that day would be 1 and the count starts over. The result is a netCDF file with time as a dimension. The values are accumulated cold days. — mkmitchell, Sep 12 '13 at 12:55

Rich Signell · Accepted Answer · 2013-09-12T20:32:56.247

Seems like this is a minor modification -- just write the data out at each time step. Something close to this should work:

from pylab import *
import netCDF4

# open NetCDF input files

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.19??.nc')
# print variables
f.variables.keys()

atemp = f.variables['air']
print atemp

ntimes, ny, nx = shape(atemp)
cold_days = zeros((ny,nx),dtype=int)

# create output NetCDF file

nco = netCDF4.Dataset('/usgs/data2/notebook/cold_days.nc','w',clobber=True)
nco.createDimension('x',nx)
nco.createDimension('y',ny)
nco.createDimension('time',ntimes)

cold_days_v = nco.createVariable('cold_days', 'i4',  ( 'time', 'y', 'x'))
cold_days_v.units='days'
cold_days_v.long_name='total number of days below 0 degC'
cold_days_v.grid_mapping = 'Lambert_Conformal'

timeo = nco.createVariable('time','f8',('time'))
lono = nco.createVariable('lon','f4',('y','x'))
lato = nco.createVariable('lat','f4',('y','x'))
xo = nco.createVariable('x','f4',('x'))
yo = nco.createVariable('y','f4',('y'))
lco = nco.createVariable('Lambert_Conformal','i4')

# copy all the variable attributes from original file
for var in ['time','lon','lat','x','y','Lambert_Conformal']:
    for att in f.variables[var].ncattrs():
        setattr(nco.variables[var],att,getattr(f.variables[var],att))

# copy variable data for time, lon,lat,x and y
timeo[:] = f.variables['time'][:]
lato[:] = f.variables['lat'][:]
xo[:] = f.variables['x'][:]
yo[:] = f.variables['y'][:]

for i in xrange(ntimes):
    cold_days += atemp[i,:,:].data-273.15 < 0
    #  write the cold_days data
    cold_days_v[i,:,:]=cold_days


# copy Global attributes from original file
for att in f.ncattrs():
    setattr(nco,att,getattr(f,att))

nco.Conventions='CF-1.6'
nco.close()

I didn't do much testing going between my method above and Rich's method but from what I saw both appeared to work at roughly the same speed. — mkmitchell, Dec 11 '13 at 16:53

Efficient netCDF analysis when looping through data

1 Answers1