4

I have the following shapefile and netcdf file.

I would like to extract data from the netcdf file that are contained within the boundaries of the shapefile.

Do you have any suggestion on how I can achieve this?

The shapefile corresponds to SREX region 11 North Europe (NEU) and the netcdf file is an example of CMIP6 climate model data output (ua variable). My desired output has to be in netcdf format.


Update

So far I tried to create a netcdf mask using NCL and CDO, and apply this mask to the original netcdf dataset. Here below the steps (and NCL scripts):

#################
## remove plev dimension from netcdf file
cdo --reduce_dim -copy nc_file.nc nc_file2.nc

## convert longitude to -180, 180
cdo sellonlatbox,-180,180,-90,90 nc_file2.nc nc_file3.nc

## create mask 
ncl create_nc_mask.ncl

## apply mask
cdo div nc_file3.nc shape1_mask.nc nc_file4.nc 
#################

The output is almost correct. See picture below. But the southern boundaries of the shapefile (SREX 11, NEU) are not captured correctly. So I suppose there is something wrong in the NCL script that generates the netcdf mask.

enter image description here

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
aaaaa
  • 149
  • 2
  • 18
  • 44
  • Are you specifically looking for nco/cdo/ncl solutions, or are other languages (e.g. Python) also welcome? – Bart Feb 19 '20 at 14:42
  • thanks @Bart. any language is welcome :) – aaaaa Feb 19 '20 at 14:52
  • What a generous bounty – Iman Nia Feb 19 '20 at 22:52
  • The solution is trivial using GDAL. Consider your NetCDF file a "raster" (in GIS parlance). You will have to convert to GeoTIFF using the same CRS as your Shapefile. Also trivial. See: https://gis.stackexchange.com/questions/118236/clip-raster-by-shapefile-in-parts – Daniel R. Livingston Feb 21 '20 at 00:02

2 Answers2

3

Re-using some old scripts/code, I quickly came up with this for a Python solution. It basically just loops over all grid points, and checks whether each grid point is inside or outside the polygon from the shape file. The result is the variable mask (array with True/False), which can be used to mask your NetCDF variables.

Note: this uses Numba (all the @jit lines) to accelerate the code, although that is not really necessary in this case. You can just comment them out if you don't have Numba.

import matplotlib.pyplot as pl
import netCDF4 as nc4
import numpy as np
import fiona
from numba import jit

@jit(nopython=True, nogil=True)
def distance(x1, y1, x2, y2):
    """
    Calculate distance from (x1,y1) to (x2,y2)
    """
    return ((x1-x2)**2 + (y1-y2)**2)**0.5

@jit(nopython=True, nogil=True)
def point_is_on_line(x, y, x1, y1, x2, y2):
    """
    Check whether point (x,y) is on line (x1,y1) to (x2,y2)
    """

    d1 = distance(x,  y,  x1, y1)
    d2 = distance(x,  y,  x2, y2)
    d3 = distance(x1, y1, x2, y2)

    eps = 1e-12
    return np.abs((d1+d2)-d3) < eps

@jit(nopython=True, nogil=True)
def is_left(xp, yp, x0, y0, x1, y1):
    """
    Check whether point (xp,yp) is left of line segment ((x0,y0) to (x1,y1))
    returns:  >0 if left of line, 0 if on line, <0 if right of line
    """

    return (x1-x0) * (yp-y0) - (xp-x0) * (y1-y0)

@jit(nopython=True, nogil=True)
def is_inside(xp, yp, x_set, y_set, size):
    """
    Given location (xp,yp) and set of line segments (x_set, y_set), determine
    whether (xp,yp) is inside polygon.
    """

    # First simple check on bounds
    if (xp < x_set.min() or xp > x_set.max() or yp < y_set.min() or yp > y_set.max()):
        return False

    wn = 0
    for i in range(size-1):

        # Second check: see if point exactly on line segment:
        if point_is_on_line(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]):
            return False

        # Calculate winding number
        if (y_set[i] <= yp):
            if (y_set[i+1] > yp):
                if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) > 0):
                    wn += 1
        else:
            if (y_set[i+1] <= yp):
                if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) < 0):
                    wn -= 1

    if wn == 0:
        return False
    else:
        return True

@jit(nopython=True, nogil=True)
def calc_mask(mask, lon, lat, shp_lon, shp_lat):
    """
    Calculate mask of grid points which are inside `shp_lon, shp_lat`
    """

    for j in range(lat.size):    
        for i in range(lon.size):
            if is_inside(lon[i], lat[j], shp_lon, shp_lat, shp_lon.size):
                mask[j,i] = True


if __name__ == '__main__':

    # Selection of time and level:
    time = 0
    plev = 0

    # Read NetCDF variables, shifting the longitudes
    # from 0-360 to -180,180, like the shape file:
    nc = nc4.Dataset('nc_file.nc')
    nc_lon = nc.variables['lon'][:]-180.
    nc_lat = nc.variables['lat'][:]
    nc_ua  = nc.variables['ua'][time,plev,:,:]

    # Read shapefile and first feature
    fc = fiona.open("shape1.shp")
    feature = next(iter(fc))

    # Extract array of lat/lon coordinates:
    coords = feature['geometry']['coordinates'][0]
    shp_lon = np.array(coords)[:,0]
    shp_lat = np.array(coords)[:,1]

    # Calculate mask
    mask = np.zeros_like(nc_ua, dtype=bool)
    calc_mask(mask, nc_lon, nc_lat, shp_lon, shp_lat)

    # Mask the data array
    nc_ua_masked = np.ma.masked_where(~mask, nc_ua)

    # Plot!
    pl.figure(figsize=(8,4))
    pl.subplot(121)
    pl.pcolormesh(nc_lon, nc_lat, nc_ua, vmin=-40, vmax=105)
    pl.xlim(-20, 50)
    pl.ylim(40, 80)

    pl.subplot(122)
    pl.pcolormesh(nc_lon, nc_lat, nc_ua_masked, vmin=-40, vmax=105)
    pl.xlim(-20, 50)
    pl.ylim(40, 80)

    pl.tight_layout()

enter image description here

EDIT

To write the mask to NetCDF, something like this can be used:

nc_out = nc4.Dataset('mask.nc', 'w')
nc_out.createDimension('lon', nc_lon.size)
nc_out.createDimension('lat', nc_lat.size)

nc_mask_out = nc_out.createVariable('mask', 'i2', ('lat','lon'))
nc_lon_out = nc_out.createVariable('lon', 'f8', ('lon'))
nc_lat_out = nc_out.createVariable('lat', 'f8', ('lat'))

nc_mask_out[:,:] = mask[:,:]  # Or ~mask to reverse it
nc_lon_out[:] = nc_lon[:]     # With +180 if needed
nc_lat_out[:] = nc_lat[:]

nc_out.close()
Bart
  • 9,825
  • 5
  • 47
  • 73
  • thanks. it works. how can I save nc_ua_masked, nc_lat and nc_lon variables as a unique netcdf file? – aaaaa Feb 21 '20 at 14:09
  • also...the nc_ua_masked array is not the full array with daily ua values..I didn't solve my problem. – aaaaa Feb 21 '20 at 15:40
  • even saving the mask as netcdf file with lat, long and TRUE/FALSE variables would be good – aaaaa Feb 21 '20 at 16:06
  • I added an example of how to write the mask to NetCDF. It uses a `short` to write the mask as `0-1`, NetCDF4 doesn't seem to support booleans (?). As an alternative, it would also be possible to apply the mask in place (on a copy of the NetCDF files) for all levels and time steps. – Bart Feb 21 '20 at 16:16
0

So far I came up with this (I know its not the complete solution):

1) To open the shapefile and nc file, you will need two packages to install:

pip3 install pyshp
pip3 install netCDF4

2) Then this is how you import them in python:

import shapefile
from netCDF4 import Dataset

3) Read data from shapefile:

with shapefile.Reader("shape1.dbf") as dbf:
    print(f'{dbf}\n')
    print(f'bounding box: {dbf.bbox}')
    shapes = dbf.shapes()
    print(f'points: {shapes[0].points}')
    print(f'parts: {shapes[0].parts}')
    print(f'fields: {dbf.fields}')
    records = dbf.records()
    dct = records[0].as_dict()
    print(f'record: {dct}')

This gives you the output:

shapefile Reader
    1 shapes (type 'POLYGON')
    1 records (4 fields)

bounding box: [-10.0, 48.0, 40.0, 75.0]
points: [(-10.0, 48.0), (-10.0, 75.0), (40.0, 75.0), (40.0, 61.3), (-10.0, 48.0)]
parts: [0]
fields: [('DeletionFlag', 'C', 1, 0), ['NAME', 'C', 40, 0], ['LAB', 'C', 40, 0], ['USAGE', 'C', 40, 0]]
record: {'NAME': 'North Europe [NEU:11]', 'LAB': 'NEU', 'USAGE': 'land'}

4) Read the nc file:

nc_fdata = Dataset('nc_file.nc', 'r')

5) Use this helper function to see what is inside:

def ncdump(nc_fid, verb=True):
    def print_ncattr(key):
        try:
            print("\t\ttype:", repr(nc_fid.variables[key].dtype))
            for ncattr in nc_fid.variables[key].ncattrs():
                print('\t\t%s:' % ncattr, repr(nc_fid.variables[key].getncattr(ncattr)))
        except KeyError:
            print("\t\tWARNING: %s does not contain variable attributes" % key)

    # NetCDF global attributes
    nc_attrs = nc_fid.ncattrs()
    if verb:
        print("NetCDF Global Attributes:")
        for nc_attr in nc_attrs:
            print('\t%s:' % nc_attr, repr(nc_fid.getncattr(nc_attr)))
    nc_dims = [dim for dim in nc_fid.dimensions]  # list of nc dimensions
    # Dimension shape information.
    if verb:
        print("NetCDF dimension information:")
        for dim in nc_dims:
            print("\tName:", dim)
            print("\t\tsize:", len(nc_fid.dimensions[dim]))
            print_ncattr(dim)
    # Variable information.
    nc_vars = [var for var in nc_fid.variables]  # list of nc variables
    if verb:
        print("NetCDF variable information:")
        for var in nc_vars:
            if var not in nc_dims:
                print('\tName:', var)
                print("\t\tdimensions:", nc_fid.variables[var].dimensions)
                print("\t\tsize:", nc_fid.variables[var].size)
                print_ncattr(var)
    return nc_attrs, nc_dims, nc_vars

nc_attrs, nc_dims, nc_vars = ncdump(nc_fdata)

I guess you will need the variable called 'ua', because that has both longitude and latitude addresses among others.

So in order to construct the mask, you will have to extract everything from 'ua', where the longitude and latitude is between the shapefile's boundingbox values.

kalzso
  • 502
  • 2
  • 6
  • 27
  • How does this answer the question? Reading the NetCDF and shape files is the trivial part of the question (which is covered by just about every online tutorial). – Bart Feb 21 '20 at 11:20
  • In the very first line, I wrote that this is not a complete solution. The user wants to extract data from the file, I helped him how can he do that in python, and left a hint how he could proceed. – kalzso Feb 21 '20 at 11:45
  • My opinion: you leave the most difficult part open (and only _"solve"_ the trivial part), in which case I wouldn't call this an answer. – Bart Feb 21 '20 at 12:05
  • 1
    It was not meant to solve the problem, I only wanted to give the user a headstart. Also it was too much to place in a comment, so I had to write as "answer". – kalzso Feb 21 '20 at 12:16