0

Please consider a df1 : df.dtypes

DAT_RUN             datetime64[ns]
DAT_FORECAST        datetime64[ns]
LIB_SOURCE          object
LONGITUDE           object
LATITUDE            object
MEASURE1            float64
MEASURE2            float64

12 first rows (grouped by DAT_RUN and DAT_FORECAST):

      DAT_RUN        DAT_FORECAST LIB_SOURCE LONGITUDE      LATITUDE    MEASURE1    MEASURE2     
0  2022-04-02 2022-04-02 01:00:00    gfs_025          43.5         3.75         5.542505          54.8  
1  2022-04-02 2022-04-02 01:00:00    gfs_025          43.5          4.0        12.542505          57.7  
2  2022-04-02 2022-04-02 01:00:00    gfs_025          43.5         4.25        10.842505          53.7  
3  2022-04-02 2022-04-02 01:00:00    gfs_025          43.5          4.5         8.742505          49.1  
4  2022-04-02 2022-04-02 01:00:00    gfs_025         43.75         3.75         2.042505          58.1  
5  2022-04-02 2022-04-02 01:00:00    gfs_025         43.75          4.0         3.742505          46.9  
6  2022-04-02 2022-04-02 01:00:00    gfs_025         43.75         4.25         4.942505          42.9  
7  2022-04-02 2022-04-02 01:00:00    gfs_025         43.75          4.5         4.142505          45.5  
8  2022-04-02 2022-04-02 01:00:00    gfs_025          44.0         3.75        -0.057495          58.3  
9  2022-04-02 2022-04-02 01:00:00    gfs_025          44.0          4.0         1.942505          53.0  
10 2022-04-02 2022-04-02 01:00:00    gfs_025          44.0         4.25         3.542505          47.0  
11 2022-04-02 2022-04-02 01:00:00    gfs_025          44.0          4.5         4.242505          45.6  

And df2 dataframe with:

df2
  LATITUDE LONGITUDE
0       x1        y1
1       x2        y2
2       x3        y3
3       x4        y4
4       x5        y5

I want to interpolate df1 data:

  1. for each df1 subgroup grouped by DAT_RUN and DAT_FORECAST (12 rows):
  2. Consider that first 3 rows (0, 1 and 2) of df1 are nearest df2 (x1, y1).

How to interpolate and create a new row in df3 with : LATITUDE = x , LONGITUDE = y, mean (or other operation) applied to MEASURE1 and MEASURE2:

So from 12 df1 rows we get 5 news rows (rows number of df2).

Here is the fist df3 row:

df3 : 
DAT_RUN        DAT_FORECAST        LIB_SOURCE LONGITUDE LATITUDE MEASURE1                       MEASURE2     
0  2022-04-02 2022-04-02 01:00:00  gfs_025    x1        x2       mean(5.542505+12.542505+10.842505) mean(54.8+57.7+53.7) 

Perhaps use scipy or https://www.pygmt.org/latest/api/generated/pygmt.grdtrack.html?highlight=grdtrack#pygmt.grdtrack but I have non idea for this last.

Thanks.

Theo75
  • 477
  • 4
  • 14
  • Why have the longtitude and latitude column datatype 'object' and not 'float64'? – The_spider Apr 02 '22 at 09:16
  • I don't know. But because I need 15 decimal precision to lon/lat, I converted to Decimal before : import decimal decimal.getcontext().prec = 15 df["LONGITUDE"] = df["LONGITUDE"].astype(str).map(decimal.Decimal) – Theo75 Apr 02 '22 at 10:21
  • But dtypes returns object type after convertion... – Theo75 Apr 02 '22 at 10:22

1 Answers1

1

I'm not sure I fully understand, so apologies if I misinterpret your question. If you want to sample the values of a grid (or multiple grids) at specific coordinates, you can use the below bit of code and change the input grid and coordinates in the dataframe: df.

import ensaio # used to get example grids
import pandas as pd
import pygmt

input_grid = ensaio.fetch_earth_topography(version=1)
input_grid2 = ensaio.fetch_earth_geoid(version=1)

df = pd.DataFrame(data = {'lat':[3.75, 4.0, 4.24], 'lon':[43.5, 43.75, 44]})
print('coordinates dataframe')
print(df)

df = pygmt.grdtrack(points=df, grid=input_grid, newcolname='sampled_data1')
df = pygmt.grdtrack(points=df, grid=input_grid2, newcolname='sampled_data2')
print('sampled dataframe')
print(df)
coordinates dataframe
    lat    lon
0  3.75  43.50
1  4.00  43.75
2  4.24  44.00
sampled dataframe
    lat    lon  sampled_data1  sampled_data2
0  3.75  43.50      56.687500      50.606252
1  4.00  43.75      35.062500      50.793751
2  4.24  44.00     125.509056      50.954522

Then to get the mean of the two grids at each point do the following (https://stackoverflow.com/a/48366525/18686384):

df['mean']=df[['sampled_data1', 'sampled_data2']].mean(axis=1)
print(df)

lat lon sampled_data1   sampled_data2   mean
0   3.75    43.50   56.687500   50.606252   53.646876
1   4.00    43.75   35.062500   50.793751   42.928126
2   4.24    44.00   125.509056  50.954522   88.231789