1

I'm trying to access daily temperature values from a NetCDF for analysis but want to create summaries of temps (i.e. total number of days within a temperature range) within different administrative units. I have a global nc file and a shapefile with the admin units as well.

My plan is to read through the temp data by looping through the lat, lon, and time (the three temp parameters) and save the desired data to a list, but am having trouble conceptualizing how to limit my count to only the pixels in a specific polygon.

Since I'm working with a state that has a lot of administrative units, some of which are fairly small, it would not be ideal for me to use a bounding box rather than the exact shape of the polygon. All I need to do is be able to loop through the pixels within, write them out to somewhere else, and move on to the next unit.

Does anyone have any suggestions on how I can loop through the polygons and read just the pixels within each one?

I've never worked with NetCDFs before so I'm not really sure where to start. I'm able to access the data itself fine but am stuck on how to overlay these admin units.

  • Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Community Jan 10 '23 at 13:18
  • 1
    Hi, I'd check these out: https://stackoverflow.com/questions/36399381, https://stackoverflow.com/a/43897516 – Zach Young Jan 11 '23 at 00:18

2 Answers2

2

There are several algorithms for checking whether a point is inside a polygon, see e.g.: https://en.wikipedia.org/wiki/Point_in_polygon . You can use that to generate a mask, which you can then use to calculate statistics over only the masked grid ponts.

Using some functions that I wrote before, I quickly coded together this example. Without Numba (the @jit's which are now commented out) this works fine for small datasets, but it becomes very slow for large ones. With Numba, I get these timings for a 1024 x 1024 grid point dataset:

In [42]: timeit get_mask(mask, lons, lats, poly_x, poly_y)
37.7 ms ± 79.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Which to me seems fine.

import numpy as np
#from numba import jit

#@jit(nopython=True, nogil=True)
def is_left(xp, yp, x0, y0, x1, y1):
    """
    Check whether point (xp,yp) is left of line segment ((x0,y0) to (x1,y1))
    returns:  >0 if left of line, 0 if on line, <0 if right of line
    """

    return (x1-x0) * (yp-y0) - (xp-x0) * (y1-y0)

#@jit(nopython=True, nogil=True)
def distance(x1, y1, x2, y2):
    """
    Calculate Euclidean distance.
    """
    return ((x1-x2)**2 + (y1-y2)**2)**0.5

#@jit(nopython=True, nogil=True)
def point_is_on_line(x, y, x1, y1, x2, y2):
    """
    Check whether point it exactly on line
    """

    d1 = distance(x,  y,  x1, y1)
    d2 = distance(x,  y,  x2, y2)
    d3 = distance(x1, y1, x2, y2)

    eps = 1e-12
    return np.abs((d1+d2)-d3) < eps

#@jit(nopython=True, nogil=True)
def is_inside(xp, yp, x_set, y_set, size):
    """
    Given location (xp,yp) and set of line segments (x_set, y_set), determine
    whether (xp,yp) is inside (or on) polygon.
    """

    # First simple check on bounds
    if (xp < x_set.min() or xp > x_set.max() or yp < y_set.min() or yp > y_set.max()):
        return False

    wn = 0
    for i in range(size-1):

        # Second check: see if point exactly on line segment:
        if point_is_on_line(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]):
            return False

        #if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) == 0):
        #    return False

        # Calculate winding number
        if (y_set[i] <= yp):
            if (y_set[i+1] > yp):
                if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) > 0):
                    wn += 1
        else:
            if (y_set[i+1] <= yp):
                if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) < 0):
                    wn -= 1

    if (wn == 0):
        return False
    else:
        return True

#@jit(nopython=True, nogil=True)
def get_mask(mask, x, y, poly_x, poly_y):
    """
    Generate mask for grid points inside polygon
    """

    for j in range(y.size):
        for i in range(x.size):
            if is_inside(lons[i], lats[j], poly_x, poly_y, poly_x.size):
                mask[j,i] = True

    return mask


if __name__ == '__main__':
    import matplotlib.pyplot as pl
    pl.close('all')

    # Dummy data.
    lats = np.linspace(0, 20, 16)
    lons = np.linspace(0, 16, 16)
    
    data = np.arange(lats.size*lons.size).reshape((lats.size, lons.size))
    
    # Bounding box.
    poly_x = np.array([2, 12, 9, 6, 2])
    poly_y = np.array([4, 8.5, 15, 14, 4])
    
    # Generate mask for calculating statistics.
    mask = np.zeros_like(data, dtype=bool)
    get_mask(mask, lons, lats, poly_x, poly_y)
    
    # Calculate statistics.
    max_val = data[mask].max()
    
    # Plot data and mask.
    pl.figure(figsize=(10,4))
    pl.subplot(121)
    pl.title('data')
    pl.pcolormesh(lons, lats, data)
    pl.plot(poly_x, poly_y)
    pl.colorbar()
    
    pl.subplot(122)
    pl.title('averaging mask, max_value={}'.format(max_val))
    pl.pcolormesh(lons, lats, mask)
    pl.plot(poly_x, poly_y)
    pl.colorbar()
    
    pl.tight_layout()

enter image description here

Bart
  • 9,825
  • 5
  • 47
  • 73
1

I went little bit further with the answer by @Bart.

You are working with 3D data - time dimension and then latitude and longitude.

If one needs to calculate statistics in different regions, I would suggest making a 2D map with some integer like values for different regions with the same spatial size as the original input data.

Here is the updated code based on @Bart's initial solution to find the number occurrences of means of 2 different regions in some predefined range:

import numpy as np
#from numba import jit

#@jit(nopython=True, nogil=True)
def is_left(xp, yp, x0, y0, x1, y1):
    """
    Check whether point (xp,yp) is left of line segment ((x0,y0) to (x1,y1))
    returns:  >0 if left of line, 0 if on line, <0 if right of line
    """

    return (x1-x0) * (yp-y0) - (xp-x0) * (y1-y0)

#@jit(nopython=True, nogil=True)
def distance(x1, y1, x2, y2):
    """
    Calculate Euclidean distance.
    """
    return ((x1-x2)**2 + (y1-y2)**2)**0.5

#@jit(nopython=True, nogil=True)
def point_is_on_line(x, y, x1, y1, x2, y2):
    """
    Check whether point it exactly on line
    """

    d1 = distance(x,  y,  x1, y1)
    d2 = distance(x,  y,  x2, y2)
    d3 = distance(x1, y1, x2, y2)

    eps = 1e-12
    return np.abs((d1+d2)-d3) < eps

#@jit(nopython=True, nogil=True)
def is_inside(xp, yp, x_set, y_set, size):
    """
    Given location (xp,yp) and set of line segments (x_set, y_set), determine
    whether (xp,yp) is inside (or on) polygon.
    """

    # First simple check on bounds
    if (xp < x_set.min() or xp > x_set.max() or yp < y_set.min() or yp > y_set.max()):
        return False

    wn = 0
    for i in range(size-1):

        # Second check: see if point exactly on line segment:
        if point_is_on_line(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]):
            return False

        #if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) == 0):
        #    return False

        # Calculate winding number
        if (y_set[i] <= yp):
            if (y_set[i+1] > yp):
                if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) > 0):
                    wn += 1
        else:
            if (y_set[i+1] <= yp):
                if (is_left(xp, yp, x_set[i], y_set[i], x_set[i+1], y_set[i+1]) < 0):
                    wn -= 1

    if (wn == 0):
        return False
    else:
        return True

#@jit(nopython=True, nogil=True)
def get_mask(mask, x, y, poly_x, poly_y):
    """
    Generate mask for grid points inside polygon
    """

    for j in range(y.size):
        for i in range(x.size):
            if is_inside(lons[i], lats[j], poly_x, poly_y, poly_x.size):
                mask[j,i] = True

    return mask


if __name__ == '__main__':
    import pandas as pd
    import matplotlib.pyplot as plt
    plt.close('all')
    # --------------------------------------------------------------------------------------------------------------
    # Dummy data.
    lats = np.linspace(0, 20, 16)
    lons = np.linspace(0, 16, 16)
    time = np.linspace(0,365,365)
    # --------------------------------------------------------------------------------------------------------------
    # let us make some random yearly data of temperature:
    np.random.seed(9)
    data = 0+100*(np.random.random((time.size,lats.size,lons.size))-0.5) # this is really random data
    temprange = [10,20]
    # --------------------------------------------------------------------------------------------------------------
    # let us have several areas:
    # Bounding box.
    poly_x = np.array([2, 12, 9, 6, 2])
    poly_y = np.array([4, 8.5, 15, 14, 4])
    # ---------------------------------------------------------------------------------------------------------------
    # Define areas for calculating statistics, I will use values 1 and 2 for in polygon and outside of it, one define a lot of different:
    regions = np.zeros((lats.size,lons.size))
    mask = np.zeros((lats.size,lons.size), dtype=bool)
    get_mask(mask, lons, lats, poly_x, poly_y)
    regions[mask==True] = 1
    regions[mask==False] = 2
    # ---------------------------------------------------------------------------------------------------------------
    # our "complicated" region map:
    fig = plt.figure(figsize=(10,10));ax = fig.add_subplot(111);
    p0 = ax.pcolormesh(lons,lats,regions);plt.colorbar(p0);
    ax.set_title('Different regions')
    plt.show()
    # ---------------------------------------------------------------------------------------------------------------
    # Let us find the number of days within some range:
    statsout = {}
    for regval in np.unique(regions):
        regmeantemp = np.average(data[:,regions==regval],axis=(1))  # this is mean serie for the polygon
        # -----------------------------------------------------------------------------------------------------------
        fig = plt.figure(figsize=(10,10));ax = fig.add_subplot(111)
        p0 = ax.plot(time,regmeantemp);
        ax.set_xlabel('Time');ax.set_ylabel('Mean temperature')
        ax.set_title('Mean inside basin set to value '+str(regval));
        plt.show()
        # -----------------------------------------------------------------------------------------------------------
        # let us collect the occurrences of temperature in some pre-defined range:
        statsout[regval] = np.sum((regmeantemp > temprange[0]) & (regmeantemp < temprange[1]))
    # ----------------------------------------------------------------------------------------------------------------
    print(statsout)

Hope this helps!

msi_gerva
  • 2,021
  • 3
  • 22
  • 28