I am trying to create a new NetCDF file from an existing NetCDF file. I am only interested in using 12 variables from a list of 177 variables. You can find the sample NetCDF file from this ftp site here.
I used the following code from a previous SO answer. You can find it here.
import netCDF4 as nc
file1 = '/media/sf_jason2/cycle_001/JA2_GPN_2PdP001_140_20080717_113355_20080717_123008.nc'
file2 = '/home/sandbox/test.nc'
toinclude = ['lat_20hz', 'lon_20hz', 'time_20hz', 'alt_20hz', 'ice_range_20hz_ku', 'ice_qual_flag_20hz_ku', 'model_dry_tropo_corr', 'model_wet_tropo_corr', 'iono_corr_gim_ku', 'solid_earth_tide', 'pole_tide', 'alt_state_flag_ku_band_status']
with nc.Dataset(file1) as src, nc.Dataset(file2, "w") as dst:
# copy attributes
for name in src.ncattrs():
dst.setncattr(name, src.getncattr(name))
# copy dimensions
for name, dimension in src.dimensions.iteritems():
dst.createDimension(
name, (len(dimension) if not dimension.isunlimited else None))
# copy all file data for variables that are included in the toinclude list
for name, variable in src.variables.iteritems():
if name in toinclude:
x = dst.createVariable(name, variable.datatype, variable.dimensions)
dst.variables[name][:] = src.variables[name][:]
The issue that I am having is that the original file is only 5.3 MB, however when I copy the new variables over the new file size is around 17 MB. The whole point of stripping the variables is to decrease the file size, but I am ending up with a larger file size.
I have tried using xarray as well. But I am having issues when I am trying to merge multiple variables. The following is the code that I am trying to implement in xarray.
import xarray as xr
fName = '/media/sf_jason2/cycle_001/JA2_GPN_2PdP001_140_20080717_113355_20080717_123008.nc'
file2 = '/home/sandbox/test.nc'
toinclude = ['lat_20hz', 'lon_20hz', 'time_20hz', 'alt_20hz', 'ice_range_20hz_ku', 'ice_qual_flag_20hz_ku', 'model_dry_tropo_corr', 'model_wet_tropo_corr', 'iono_corr_gim_ku', 'solid_earth_tide', 'pole_tide', 'alt_state_flag_ku_band_status']
ds = xr.open_dataset(fName)
newds = xr.Dataset()
newds['lat_20hz'] = ds['lat_20hz']
newds.to_netcdf(file2)
Xarray works fine if I am trying to copy over one variable, however, it's having issues when I am trying to copy multiple variables to an empty dataset. I couldn't find any good examples of copying multiple variables using xarray. I am fine achieving this workflow either way.
Ultimately, How can I decrease the file size of the new NetCDF that is being created using netCDF4? If that's not ideal, is there a way to add multiple variables to an empty dataset in xarray without merging issues?