17

I need to process a single variable in a netcdf file that actually contains many attributes and variable. I think it is not possible to update a netcdf file (see question How to delete a variable in a Scientific.IO.NetCDF.NetCDFFile?)

My approach is the following:

  1. get the variable to process from the original file
  2. process the variable
  3. copy all data from the original netcdf BUT the processed variable to the final file
  4. copy the processed variable to the final file

My problem is to code step 3. I started with the following:

def  processing(infile, variable, outfile):
        data = fileH.variables[variable][:]

        # do processing on data...

        # and now save the result
        fileH = NetCDFFile(infile, mode="r")
        outfile = NetCDFFile(outfile, mode='w')
        # build a list of variables without the processed variable
        listOfVariables = list( itertools.ifilter( lamdba x:x!=variable , fileH.variables.keys() ) )
        for ivar in listOfVariables:
             # here I need to write each variable and each attribute

How can I save all data and attribute in a handfull of code without having to rebuild a whole structure of data?

Community
  • 1
  • 1
Bruno von Paris
  • 882
  • 1
  • 7
  • 26

6 Answers6

19

Here's what I just used and worked. @arne's answer updated for Python 3 and also to include copying variable attributes:

import netCDF4 as nc
toexclude = ['ExcludeVar1', 'ExcludeVar2']

with netCDF4.Dataset("in.nc") as src, netCDF4.Dataset("out.nc", "w") as dst:
    # copy global attributes all at once via dictionary
    dst.setncatts(src.__dict__)
    # copy dimensions
    for name, dimension in src.dimensions.items():
        dst.createDimension(
            name, (len(dimension) if not dimension.isunlimited() else None))
    # copy all file data except for the excluded
    for name, variable in src.variables.items():
        if name not in toexclude:
            x = dst.createVariable(name, variable.datatype, variable.dimensions)
            dst[name][:] = src[name][:]
            # copy variable attributes all at once via dictionary
            dst[name].setncatts(src[name].__dict__)
Rich Signell
  • 14,842
  • 4
  • 49
  • 77
  • 4
    As of netCDF4 you must define any fill value before setting the data, otherwise you might get an `AttributeError`. Therefore, it is better to write the attributes before the data (i.e. swap the order of the last two lines: `dst[name][:] = src[name][:]` and `dst[name].setncatts(src[name].__dict__)`. – Verwirrt Aug 07 '18 at 07:51
9

If you just want to copy the file picking out variables, nccopy is a great tool as submitted by @rewfuss.

Here's a Pythonic (and more flexible) solution with python-netcdf4. This allows you to open it for processing and other calculations before writing to file.

with netCDF4.Dataset(file1) as src, netCDF4.Dataset(file2) as dst:

  for name, dimension in src.dimensions.iteritems():
    dst.createDimension(name, len(dimension) if not dimension.isunlimited() else None)

  for name, variable in src.variables.iteritems():

    # take out the variable you don't want
    if name == 'some_variable': 
      continue

    x = dst.createVariable(name, variable.datatype, variable.dimensions)
    dst.variables[x][:] = src.variables[x][:]

This does not take into account of variable attributes, such as fill_values. You can do that easily following the documentation.

Do be careful, netCDF4 files once written/created this way cannot be undone. The moment you modify the variable, it is written to file at the end of with statement, or if you call .close() on the Dataset.

Of course, if you wish to process the variables before writing them, you have to be careful about which dimensions to create. In a new file, Never write to variables without creating them. Also, never create variables without having defined dimensions, as noted in python-netcdf4's documentation.

Xavier Ho
  • 17,011
  • 9
  • 48
  • 52
  • 4
    Very nice solution to the problem, but there were a few fixes before I could get this working. First is '.iteritems()' no longer works for 3.x, need to change to just '.items()'. Second is replacing the use of the x with the string of the variable like so 'dst.variables[name][:] = src.variables[name][:]'. – captain_M Feb 19 '17 at 22:16
6

This answer builds on the one from Xavier Ho (https://stackoverflow.com/a/32002401/7666), but with the fixes I needed to complete it:

import netCDF4 as nc
import numpy as np
toexclude = ["TO_REMOVE"]
with nc.Dataset("orig.nc") as src, nc.Dataset("filtered.nc", "w") as dst:
    # copy attributes
    for name in src.ncattrs():
        dst.setncattr(name, src.getncattr(name))
    # copy dimensions
    for name, dimension in src.dimensions.iteritems():
        dst.createDimension(
            name, (len(dimension) if not dimension.isunlimited else None))
    # copy all file data except for the excluded
    for name, variable in src.variables.iteritems():
        if name not in toexclude:
            x = dst.createVariable(name, variable.datatype, variable.dimensions)
            dst.variables[name][:] = src.variables[name][:]
Community
  • 1
  • 1
Arne Babenhauserheide
  • 2,423
  • 29
  • 23
3

The nccopy utility in C netCDF versions 4.3.0 and later includes an option to list which variables are to be copied (along with their attributes). Unfortunately, it doesn't include an option for which variables to exclude, which is what you need.

However, if the list of (comma-delimited) variables to be included doesn't cause the nccopy command-line to exceed system limits, this would work. There are two variants for this option:

nccopy -v var1,var2,...,varn input.nc output.nc
nccopy -V var1,var2,...,varn input.nc output.nc

The first (-v) includes all the variable definitions, but only data for the named variables. The second (-V) doesn't include definitions or data for unnamed variables.

rewfuss
  • 81
  • 3
1

I know this is an old question, but as an alternative, you can use the library netcdf and shutil:

import shutil
from netcdf import netcdf as nc

def processing(infile, variable, outfile):
    shutil.copyfile(infile, outfile)
    with nc.loader(infile) as in_root, nc.loader(outfile) as out_root:
        data = nc.getvar(in_root, variable)
        # do your processing with data and save them as memory "values"...
        values = data[:] * 3
        new_var = nc.getvar(out_root, variable, source=data)
        new_var[:] = values
ecolell
  • 125
  • 3
  • 9
  • 1
    hi @ecolell, I know this is an older response. In the mean time we work with netCDF4 and there seems to be no 'loader' class included. Would you know how to apply this code with netCDF4? – Linda May 07 '20 at 09:00
  • @Linda I'm not active on this library, but I've just uploaded a copy of the last version of the library. In the following link you will find the code: https://github.com/ecolell/netcdf/blob/0dafde1f72fcb932f5ed38e99019961456f8f1ce/netcdf/netcdf.py#L342 (hope it helps to you). – ecolell May 11 '20 at 15:29
0

All of the recipes so far (except for one form @rewfuss, which works fine, but is not exactly a pythonic one) produce a plain NetCDF3 file, which might be killing on highly compressed NetCDF4 datasets. Here is an attempt to cope with the issue.

import netCDF4                                        
                                                      
infname="Inf.nc"                                      
outfname="outf.nc"                                    
                                                      
skiplist="var1 var2".split()                          
                                                      
with netCDF4.Dataset(infname) as src:                 
                                                      
    with netCDF4.Dataset(outfname, "w", format=src.file_format) as dst:
        # copy global attributes all at once via dictionary
        dst.setncatts(src.__dict__)                   
        # copy dimensions                             
        for name, dimension in src.dimensions.items():
            dst.createDimension(                      
                name, (len(dimension) if not dimension.isunlimited() else None))
        # copy all file data except for the excluded  
        for name, variable in src.variables.items():
                if name in skiplist:
                    continue
                createattrs = variable.filters()      
                if createattrs is None:               
                    createattrs = {}                  
                else:                                 
                    chunksizes = variable.chunking()  
                    print(createattrs)                
                    if chunksizes == "contiguous":    
                        createattrs["contiguous"] = True
                    else:                             
                        createattrs["chunksizes"] =  chunksizes
                x = dst.createVariable(name, variable.datatype, variable.dimensions, **createattrs)
                # copy variable attributes all at once via dictionary
                dst[name].setncatts(src[name].__dict__)
                dst[name][:] = src[name][:]

This seems to work fine and store the variables the way they are in the original file, except it does not copy some variable attributes that start from _underscore, and are not known to the NetCDF library.

Roux
  • 435
  • 3
  • 12