12

I would like to extract a spatial subset of a rather large netcdf file. From Loop through netcdf files and run calculations - Python or R

from pylab import *
import netCDF4

f = netCDF4.MFDataset('/usgs/data2/rsignell/models/ncep/narr/air.2m.1989.nc')
# print variables
f.variables.keys()
atemp = f.variables['air'] # TODO: extract spatial subset

How do I extract just the subset of netcdf file corresponding to a state (say Iowa). Iowa has following boundary lat lon:

Longitude: 89° 5' W to 96° 31' W

Latitude: 40° 36' N to 43° 30' N

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
user308827
  • 21,227
  • 87
  • 254
  • 417

7 Answers7

17

Well this is pretty easy, you have to find the index for the upper and lower bound in latitude and longitude. You can do it by finding the value that is closest to the ones you're looking for.

latbounds = [ 40 , 43 ]
lonbounds = [ -96 , -89 ] # degrees east ? 
lats = f.variables['latitude'][:] 
lons = f.variables['longitude'][:]

# latitude lower and upper index
latli = np.argmin( np.abs( lats - latbounds[0] ) )
latui = np.argmin( np.abs( lats - latbounds[1] ) ) 

# longitude lower and upper index
lonli = np.argmin( np.abs( lons - lonbounds[0] ) )
lonui = np.argmin( np.abs( lons - lonbounds[1] ) )  

Then just subset the variable array.

# Air (time, latitude, longitude) 
airSubset = f.variables['air'][ : , latli:latui , lonli:lonui ] 
  • Note, i'm assuming the longitude dimension variable is in degrees east, and the air variable has time, latitude, longitude dimensions.
Favo
  • 818
  • 6
  • 15
  • Is there a straightforward way to spit out airSubset as a netcdf file as well? – user308827 Mar 20 '15 at 03:07
  • 2
    With the netcdf4-python library the most straightforward way is creating a new netcdf file, adding dimensions, the variable name, its attributes and the save the array of data. That is around 4~5 lines of code. Another good python library is IRIS (http://scitools.org.uk/iris) , it has pretty good methods to plot, interpolate , regrid, and save subsets of netcdf files. – Favo Mar 22 '15 at 22:06
  • When I try this on my dataset I get "slicing expression exceeds the number of dimensions of the variable", how do I get it to return all the dimensional data that falls within the specified spatial extent? – MrKingsley May 06 '22 at 12:59
12

Favo's answer works (I assume; haven't checked). A more direct and idiomatic way is to use numpy's where function to find the necessary indices.

lats = f.variables['latitude'][:] 
lons = f.variables['longitude'][:]
lat_bnds, lon_bnds = [40, 43], [-96, -89]

lat_inds = np.where((lats > lat_bnds[0]) & (lats < lat_bnds[1]))
lon_inds = np.where((lons > lon_bnds[0]) & (lons < lon_bnds[1]))

air_subset = f.variables['air'][:,lat_inds,lon_inds]
Spencer Hill
  • 899
  • 6
  • 23
  • 2
    Using this comment, I get the following error IndexError: Index cannot be multidimensional This is solved by adding [0] to the lines starting with lat_inds and lon_inds. – Femkemilene Apr 22 '19 at 09:37
  • I'm getting the same error but where are you adding [0] to get this to work? Can you provide an example? – MrKingsley May 06 '22 at 12:52
7

If you like pandas, then you should think about checking out xarray.

import xarray as xr

ds = xr.open_dataset('http://geoport.whoi.edu/thredds/dodsC/usgs/data2/rsignell/models/ncep/narr/air.2m.1980.nc',
                     decode_cf=False)
lat_bnds, lon_bnds = [40, 43], [-96, -89]
ds.sel(lat=slice(*lat_bnds), lon=slice(*lon_bnds))
jsignell
  • 3,072
  • 1
  • 22
  • 23
  • 2
    ds.where will just [mask](http://xarray.pydata.org/en/stable/indexing.html#masking-with-where), I'd suggest `ds.sel(lat=slice(*lat_bnds), lon=slice(*lon_bnds))` instead. – user2856 Nov 23 '16 at 04:00
  • When I try this one I get key error 'lat is not a valid dimension or coordinate' in my dataset it is 'latitude' and 'longitude but when I change the lines to ds.sel(latitude=slice(*latbound), longitude=slice(*lonbound)) I get the same key error for latitude. Using print(List.ds.keys()) it shows latitude and longitude as keys but I still get a key error. Can I use an index? – MrKingsley May 06 '22 at 13:36
4

Note that this can be accomplished even quicker on the command line using NCO's ncks.

ncks -v air -d latitude,40.,43. -d longitude,-89.,-96. infile.nc -O subset_infile.nc

N1B4
  • 3,377
  • 1
  • 21
  • 24
2

To mirror the response from N1B4, you can also do it on one line with climate data operators (cdo):

cdo sellonlatbox,-96.5,-89,40,43 in.nc out.nc

Thus to loop over a set of file, I would do this in a BASH script, using cdo to process each file and then calling your python script:

#!/bin/bash

# pick up a list of files (I'm presuming the loop is over the years)
files=`ls /usgs/data2/rsignell/models/ncep/narr/air.2m.*.nc`

for file in $files ; do 
   # extract the location, I haven't used your exact lat/lons
   cdo sellonlatbox,-96.5,-89,40,43 $file iowa.nc

   # Call your python or R script here to process file iowa.nc
   python script
done 

I always try and do my file processing "offline" as I find it less prone to error. cdo is an alternative to ncks, I'm not saying it is better, I just find it easier to remember the commands. nco in general is more powerful and I resort to it when cdo can't perform the task I wish to carry out.

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
1

Small change needs to be made to the lonbounds part (data are degrees east), because the longitude value ranges from 0 to 359 in the data, so negative numbers will not work in this case. Also the calculation for latli and latui needs to be switched because the value goes from north to south, 89 to -89.

latbounds = [ 40 , 43 ]
lonbounds = [ 260 , 270 ] # degrees east
lats = f.variables['latitude'][:] 
lons = f.variables['longitude'][:]

# latitude lower and upper index
latli = np.argmin( np.abs( lats - latbounds[1] ) )
latui = np.argmin( np.abs( lats - latbounds[0] ) ) 

# longitude lower and upper index
lonli = np.argmin( np.abs( lons - lonbounds[0] ) )
lonui = np.argmin( np.abs( lons - lonbounds[1] ) )  
ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
0

If you are working in Linux or macOS this can be handled very easily using nctoolkit (https://nctoolkit.readthedocs.io/en/latest/):

import nctoolkit as nc
data = nc.open_data('/usgs/data2/rsignell/models/ncep/narr/air.2m.1989.nc')
data.crop(lon = [-(96+31/60), -(89+5/6)], lat = [40 + 36/60, 43 + 30/60])
Robert Wilson
  • 3,192
  • 11
  • 19