7

I have a netcdf file with data as a function of lon,lat and time. I would like to calculate the total number of missing entries in each grid cell summed over the time dimension, preferably with CDO or NCO so I do not need to invoke R, python etc.

I know how to get the total number of missing values

ncap2 -s "nmiss=var.number_miss()" in.nc out.nc

as I answered to this related question: count number of missing values in netcdf file - R

and CDO can tell me the total summed over space with

cdo info in.nc

but I can't work out how to sum over time. Is there a way for example of specifying the dimension to sum over with number_miss in ncap2?

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86

2 Answers2

2

Even though you are asking for another solution, I would like to show you that it takes only one very short line to find the answer with the help of Python. The variable m_data has exactly the same shape as a variable with missing values read using the netCDF4 package. With the execution of only one np.sum command with the correct axis specified, you have your answer.

import numpy as np
import matplotlib.pyplot as plt
import netCDF4 as nc4

# Generate random data for this experiment.
data = np.random.rand(365, 64, 128)

# Masked data, this is how the data is read from NetCDF by the netCDF4 package.
# For this example, I mask all values less than 0.1.
m_data = np.ma.masked_array(data, mask=data<0.1)

# It only takes one operation to find the answer.
n_values_missing = np.sum(m_data.mask, axis=0)

# Just a plot of the result.
plt.figure()
plt.pcolormesh(n_values_missing)
plt.colorbar()
plt.xlabel('lon')
plt.ylabel('lat')
plt.show()

# Save a netCDF file of the results.
f = nc4.Dataset('test.nc', 'w', format='NETCDF4')
f.createDimension('lon', 128)
f.createDimension('lat', 64 )
n_values_missing_nc = f.createVariable('n_values_missing', 'i4', ('lat', 'lon'))
n_values_missing_nc[:,:] = n_values_missing[:,:]
f.close()
Chiel
  • 6,006
  • 2
  • 32
  • 57
  • Yes, it is concise in python, upvote for the answer, I may have to do this in python I think, I can always write the field back out to an netcdf I suppose – ClimateUnboxed May 11 '17 at 22:34
  • I added a piece of code that shows you how to save to netcdf. – Chiel May 13 '17 at 12:19
  • Chiel, I like your answer and it is compact and neat, but I switched the accepted answer to the updated NCO solution as it allow me to do the operation from the command line. Both answers are excellent. – ClimateUnboxed May 18 '17 at 13:20
2

We added the missing() function to ncap2 to solve this problem elegantly as of NCO 4.6.7 (May, 2017). To count missing values through time:

ncap2 -s 'mss_val=three_dmn_var_dbl.missing().ttl($time)' in.nc out.nc

Here ncap2 chains two methods together, missing(), followed by a total over the time dimension. The 2D variable mss_val is in out.nc. The response below does the same but averages over space and reports through time (because I misinterpreted the OP).

Old/obsolete answer:

There are two ways to do this with NCO/ncap2, though neither is as elegant as I would like. Either call assemble the answer one record at a time by calling num_miss() with one record at a time, or (my preference) use the boolean comparison function followed by the total operator along the axes of choice:

zender@aerosol:~$ ncap2 -O -s 'tmp=three_dmn_var_dbl;mss_val=tmp.get_miss();tmp.delete_miss();tmp_bool=(tmp==mss_val);tmp_bool_ttl=tmp_bool.ttl($lon,$lat);print(tmp_bool_ttl);' ~/nco/data/in.nc ~/foo.nc
tmp_bool_ttl[0]=0 
tmp_bool_ttl[1]=0 
tmp_bool_ttl[2]=0 
tmp_bool_ttl[3]=8 
tmp_bool_ttl[4]=0 
tmp_bool_ttl[5]=0 
tmp_bool_ttl[6]=0 
tmp_bool_ttl[7]=1 
tmp_bool_ttl[8]=0 
tmp_bool_ttl[9]=2

or

zender@aerosol:~$ ncap2 -O -s 'for(rec=0;rec<time.size();rec++){nmiss=three_dmn_var_int(rec,:,:).number_miss();print(nmiss);}' ~/nco/data/in.nc ~/foo.nc
nmiss = 0 

nmiss = 0 

nmiss = 8 

nmiss = 0 

nmiss = 0 

nmiss = 1 

nmiss = 0 

nmiss = 2 

nmiss = 1 

nmiss = 2 
Charlie Zender
  • 5,929
  • 14
  • 19
  • Thanks, upvote as it allows me to see the answer from the command line, but I was hoping to have the answer in a 2D netcdf file. I suppose adding a hyperslice functionality to nco , e.g. ncap2 -s "nmiss=var.number_miss(x,:,:) is probably complicated? I am not very familiar with ncap2. – ClimateUnboxed May 11 '17 at 22:36
  • We already have implemented and use syntax similar (e.g., avg=var.avg($lat,$lon)) to that which you suggest, only for averages, max/min, etc. I don't know why we did not implement that for number_miss(). Since you asked, we will put it on the TODO list :) – Charlie Zender May 11 '17 at 22:45
  • never followed up to say thanks for implementing this, has been really useful over the years, just using it again now. – ClimateUnboxed Oct 15 '21 at 10:44