I have two dataframes, one radar
which represents data on an equispaced grid with columns for longitude, latitude and height value, and one ice
that has some information related to satellite observations, including the latitude and longitude of the observation. I want to merge the two so I can get ice
with the 'height' column from radar
, based on the geodetic distance point from each ice
row to the closest radar
point.
I'm currently doing it like this:
from geopy.distance import geodesic
import pandas as pd
def get_distance(out):
global radar
dists = radar['latlon'].apply(lambda x: geodesic(out['latlon'], x).km)
out['dist to radar']=min(dists)
out['rate_yr_radar']=radar.loc[dists.idxmin()]['rate_yr_radar']
return out
ICEvsRadar=ice.apply(get_distance, axis=1)
But it's very slow, I have around 200 points in my ice
dataframe and around 50000 on the radar
one. Is a slow performance to be expected based on the computational cost of calculating each distance, or could I improve something in how I apply the function?
edit: uploaded the example data on https://wetransfer.com/downloads/284036652e682a3e665994d360a3068920221203230651/5842f2
The code takes around 25 minutes to run, ice
has lon, lat and latlon fields and is 180 rows long, and radar
has 50000 rows with lon, lat, latlon and rate_yr_radar fields
Edit: Used the help from the comment by Atanas, ended up solving it like this:
import pandas as pd
import numpy as np
from sklearn.neighbors import BallTree
#building tree
Tree = BallTree(np.deg2rad(radar[['lat', 'lon']].values), metric='haversine')
#querying the nearest neighbour
distance, index = Tree.query(np.deg2rad(ice.loc[:, ["lat","lon"]]))
#getting relevant data from radar to merge with ice
reduced_radar = radar.loc[np.concatenate(index), ["rate_yr_radar"]]
reduced_radar['dist to radar']=np.concatenate(distance)*6371 #get correct distance in km
reduced_radar = reduced_radar.reset_index().rename({"index": "index_from_radar"}, axis=1)
#joining data
ice = ice.join(reduced_radar)
It went from a 30 minute runtime to 60 milliseconds!