0

I am looking to take a subset of the netcdf data set bounded by lat/lon coordinates.

<xarray.Dataset>
Dimensions:              (ICcheckNameLen: 72, ICcheckNum: 55, QCcheckNameLen: 60, QCcheckNum: 10, maxAutoStaLen: 6, maxLocationLen: 24, maxMETARLen: 256, maxRepLen: 6, maxSkyCover: 6, maxSkyLen: 8, maxStaNamLen: 5, maxStaticIds: 10000, maxWeatherLen: 25, nInventoryBins: 24, recNum: 8329, totalIdLen: 6)
Dimensions without coordinates: ICcheckNameLen, ICcheckNum, QCcheckNameLen, QCcheckNum, maxAutoStaLen, maxLocationLen, maxMETARLen, maxRepLen, maxSkyCover, maxSkyLen, maxStaNamLen, maxStaticIds, maxWeatherLen, nInventoryBins, recNum, totalIdLen
Data variables:
    nStaticIds           int32 ...
    staticIds            (maxStaticIds, totalIdLen) |S1 ...
    lastRecord           (maxStaticIds) int32 ...
    invTime              (recNum) int32 ...
    prevRecord           (recNum) int32 ...
    inventory            (maxStaticIds) int32 ...
    globalInventory      int32 ...
    firstOverflow        int32 ...
    isOverflow           (recNum) int32 ...
    firstInBin           (nInventoryBins) int32 ...
    lastInBin            (nInventoryBins) int32 ...
    secondsStage1_2      (recNum) int32 ...
    secondsStage3        (recNum) int32 ...
    wmoId                (recNum) int32 ...
    stationName          (recNum, maxStaNamLen) |S1 ...
    locationName         (recNum, maxLocationLen) |S1 ...
    QCT                  (QCcheckNum, QCcheckNameLen) |S1 ...
    ICT                  (ICcheckNum, ICcheckNameLen) |S1 ...
    latitude             (recNum) float32 ...
    longitude            (recNum) float32 ...
    elevation            (recNum) float32 ...

I have tried multiple methods based on Help1 and Help2 to setup the boundaries which should be between latitude[20,53] and longitude[-131,-62]. The dataset can be accessed at NetCDF Data.

When I use the below, it says, "ValueError: dimensions or multi-index levels ['latitude', 'longitude'] do not exist"

import xarray as xr
ds = xr.open_dataset('/home/awips/python-awips/ups/20181110_1600.nc',
                     decode_cf=False)
print(ds)
lat_bnds, lon_bnds = [20, 53], [-131, -62]
ds.sel(latitude=slice(*lat_bnds), longitude=slice(*lon_bnds))
ds.to_netcdf(path='/home/awips/python-awips/ups/subset.nc')

When I try the below, it works through the data, but does not remove any data.

import xarray as xr
ds = xr.open_dataset('/home/awips/python-awips/ups/20181110_1600.nc', decode_cf=True)

ds.where((-131 < ds.longitude) & (ds.longitude < -62)
         & (20 < ds.latitude) & (ds.latitude < 53), drop=True)
ds.to_netcdf(path='/home/awips/python-awips/ups/subset.nc')

Any ideas?

WxJack
  • 21
  • 6
  • Your second approach looks like the right way to solve this, for cases where `latitude` and `longitude` are not dimensions. I'm surprised it isn't removing any data -- are you sure sure there are records with longitude/latitude outside those bounds? – shoyer Nov 15 '18 at 21:20
  • I changed the code to `ds.where((-95 < ds.longitude) & (ds.longitude < -80) & (30 < ds.latitude) & (ds.latitude < 35), drop=True)`. The file it creates is twice the size as the original, so something is wrong somewhere. The lat values are still there for -75. – WxJack Nov 15 '18 at 21:49
  • @shoyer, I was able to remove the data by assigning it to a new variable, but I am not sure how I properly save the new data. What worked as latitude = ds.latitude.where((ds.latitude > 20) & (ds.latitude < 53), drop=True), longitude = ds.longitude.where((ds.longitude > -131) & (ds.longitude < -62), drop=True). Any further ideas? – WxJack Nov 16 '18 at 01:59

1 Answers1

1

Xarray operations usually return new objects instead of modifying objects inplace. So you need to assign the result of where to a new variable and save that instead, e.g.,

ds2 = ds.where((-131 < ds.longitude) & (ds.longitude < -62)
               & (20 < ds.latitude) & (ds.latitude < 53), drop=True)
ds2.to_netcdf(path='/home/awips/python-awips/ups/subset.nc')
shoyer
  • 9,165
  • 1
  • 37
  • 55
  • If I use `ds2 = ds.where((ds.latitude > 20) & (ds.latitude < 50) & (ds.longitude > -131) & (ds.longitude < -62), drop=True) ds2.to_netcdf(path='/home/awips/python-awips/ups/subset.nc')` the size of the file increases dramatically and has tons of repeat variables that were not there before. – WxJack Nov 18 '18 at 15:58
  • I don't know what's going on with the extra variables, but read this issue for a discussion of what's (probably) happening with file sizes: https://github.com/pydata/xarray/issues/1572 – shoyer Nov 19 '18 at 16:10
  • @shoyer this could also be related to the variables _not_ indexed by latitude/longitude being broadcast against the where condition. The syntax you suggest should only be used for the variables in the Dataset that are indexed by `latitude` and `longitude`, no? – Michael Delgado Nov 26 '18 at 02:30