Reading and manipulating multiple netcdf files in python

Question

I need help with reading multiple netCDF files, despite few examples in here, none of them works properly. I am using Python(x,y) vers 2.7.5, and other packages : netcdf4 1.0.7-4, matplotlib 1.3.1-4, numpy 1.8, pandas 0.12, basemap 1.0.2...

I have few things I'm used to do with GrADS that I need to start doing them in Python. I have a few 2 meter temperature data (4-hourly data, each year, from ECMWF), each file contains 2 meter temp data, with Xsize=480, Ysize=241, Zsize(level)=1, Tsize(time) = 1460 or 1464 for leap years. These are my files name look alike: t2m.1981.nc, t2m.1982.nc, t2m.1983.nc ...etc.

Based on this page: ( Loop through netcdf files and run calculations - Python or R ) Here is where I am now:

from pylab import *
import netCDF4 as nc
from netCDF4 import *
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np

f = nc.MFDataset('d:/data/ecmwf/t2m.????.nc') # as '????' being the years
t2mtr = f.variables['t2m']

ntimes, ny, nx = shape(t2mtr)
temp2m = zeros((ny,nx),dtype=float64)
print ntimes
for i in xrange(ntimes):
    temp2m += t2mtr[i,:,:] #I'm not sure how to slice this, just wanted to get the 00Z values.
      # is it possible to assign to a new array,...
      #... (for eg.) the average values of  00z for January only from 1981-2000? 

#creating a NetCDF file
nco = nc.Dataset('d:/data/ecmwf/t2m.00zJan.nc','w',clobber=True)
nco.createDimension('x',nx)
nco.createDimension('y',ny)

temp2m_v = nco.createVariable('t2m', 'i4',  ( 'y', 'x'))
temp2m_v.units='Kelvin'
temp2m_v.long_name='2 meter Temperature'
temp2m_v.grid_mapping = 'Lambert_Conformal' # can it be something else or ..
#... eliminated?).This is straight from the solution on that webpage.

lono = nco.createVariable('longitude','f8')
lato = nco.createVariable('latitude','f8')
xo = nco.createVariable('x','f4',('x')) #not sure if this is important
yo = nco.createVariable('y','f4',('y')) #not sure if this is important
lco = nco.createVariable('Lambert_Conformal','i4') #not sure

#copy all the variable attributes from original file
for var in ['longitude','latitude']:
    for att in f.variables[var].ncattrs():
        setattr(nco.variables[var],att,getattr(f.variables[var],att))

# copy variable data for lon,lat,x and y
lono=f.variables['longitude'][:]
lato=f.variables['latitude'][:]
#xo[:]=f.variables['x']
#yo[:]=f.variables['y']

#  write the temp at 2 m data
temp2m_v[:,:]=temp2m

# copy Global attributes from original file
for att in f.ncattrs():
    setattr(nco,att,getattr(f,att))

nco.Conventions='CF-1.6' #not sure what is this.
nco.close()

#attempt to plot the 00zJan mean
file=nc.Dataset('d:/data/ecmwf/t2m.00zJan.nc','r')
t2mtr=file.variables['t2m'][:]
lon=file.variables['longitude'][:]
lat=file.variables['latitude'][:]
clevs=np.arange(0,500.,10.)
map =   Basemap(projection='cyl',llcrnrlat=0.,urcrnrlat=10.,llcrnrlon=97.,urcrnrlon=110.,resolution='i')
x,y=map(*np.meshgrid(lon,lat))
cs = map.contourf(x,y,t2mtr,clevs,extend='both')
map.drawcoastlines()
map.drawcountries()
plt.plot(cs)
plt.show()

First question is at the temp2m += t2mtr[1,:,:] . I am not sure how to slice the data to get only 00z (let say for January only) of all files.

Second, While running the test, an error came at cs = map.contourf(x,y,t2mtr,clevs,extend='both') saying "shape does not match that of z: found (1,1) instead of (241,480)". I know some error probably on the output data, due to error on recording the values, but I can't figure out what/where .

Thanks for your time. I hope this is not confusing.

What do you mean by 00Z data? I don't understand what format your dataset is in: 3D, with shape = (time, x, y) what I understand. Where/what is Z? — , Mar 09 '14 at 19:53
Each file contains 00z,06z,12z and 18z (time, UTC). it's a 4 times daily data. so,let say in a file t2m.2000.nc, t=1464 , 4 times daily for one year. Since the data is on the surface (2 meter temperature), the Z values = 1. It's a global gridded data. — Fir Nor, Mar 10 '14 at 15:49

hpaulj · Accepted Answer · 2014-03-10T03:34:03.157

3

So t2mtr is a 3d array

ntimes, ny, nx = shape(t2mtr)

This sums all values across the 1st axis:

for i in xrange(ntimes):
    temp2m += t2mtr[i,:,:]

A better way to do this is:

temp2m = np.sum(tm2tr, axis=0)
temp2m = tm2tr.sum(axis=0) # alt

If you want the average, use np.mean instead of np.sum.

To average across a subset of the times, jan_times, use an expression like:

jan_avg = np.mean(tm2tr[jan_times,:,:], axis=0)

This is simplest if you want just a simple range, e.g the first 30 times. For simplicity I'm assuming the data is daily and years are constant length. You can adjust things for the 4hr frequency and leap years.

tm2tr[0:31,:,:]

A simplistic way on getting Jan data for several years is to construct an index like:

yr_starts = np.arange(0,3)*365 # can adjust for leap years
jan_times = (yr_starts[:,None]+ np.arange(31)).flatten()
# array([  0,   1,   2, ...  29,  30, 365, ..., 756, 757, 758, 759, 760])

Another option would be to reshape tm2tr (doesn't work well for leap years).

tm2tr.reshape(nyrs, 365, nx, ny)[:,0:31,:,:].mean(axis=1)

You could test the time sampling with something like:

np.arange(5*365).reshape(5,365)[:,0:31].mean(axis=1)

Doesn't the data set have a time variable? You might be able to extract the desired time indices from that. I worked with ECMWF data a number of years ago, but don't remember a lot of the details.

As for your contourf error, I would check the shape of the 3 main arguments: x,y,t2mtr. They should match. I haven't worked with Basemap.

edited Mar 10 '14 at 03:34

answered Mar 09 '14 at 20:47

hpaulj

221,503
14
230
353

Thank you @hpaulj for the explanation and solution. Exactly what I'm looking for. Yes, the data can be set into certain time indices,but I've downloaded the way I did to save time (and a bit space).Thanks again. – Fir Nor Mar 10 '14 at 16:01
hi @hpaulj, I wonder in the `yr_starts = np.arange(0,3)*365 ` , does the `(0,3)` refers to time step (in this case, the 4 hour daily) or some random number of years. Thanks! – Fir Nor Mar 13 '14 at 13:41
`np.arange(0,3)*365` is just `[0, 365, 2*365]`, which I assumed would be the indexes of successive Jan 1 data points. It needs a additional `*6` for your 4hr data, and an offset for other starting points and leap years. – hpaulj Mar 13 '14 at 16:28
I'd also check the data for `nan` (or equivalent). When I used ECMWF I needed both point data like temperatures, and period data like precipitation. One set had a `nan` the end of the requested time period, the other at the beginning. – hpaulj Mar 13 '14 at 16:31
Thanks @hpaulj .. I just realized I can check the `numpy.arange`on python console as well. :) – Fir Nor Mar 14 '14 at 01:11

Reading and manipulating multiple netcdf files in python

1 Answers1