7

I have 2 dataframes I'm working with. One has a bunch of locations and coordinates (longitude, latitude). The other is a weather data set with data from weather stations all over the world and their respective coordinates. I am trying to link up the nearest weather station to each location in my data set. The weather station names and my location names are not matches.

I am left trying to link them by closest match in coordinates and have no idea where to begin.

I was thinking some use of

np.abs((location['latitude']-weather['latitude'])+(location['longitude']-weather['longitude'])

Examples of each

location...

Location   Latitude   Longitude Component  \
     A  39.463744  -76.119411    Active   
     B  39.029252  -76.964251    Active   
     C  33.626946  -85.969576    Active   
     D  49.286337   10.567013    Active   
     E  37.071777  -76.360785    Active   

weather...

     Station Code             Station Name  Latitude  Longitude
     US1FLSL0019    PORT ST. LUCIE 4.0 NE   27.3237   -80.3111
     US1TXTV0133            LAKEWAY 2.8 W   30.3597   -98.0252
     USC00178998                  WALTHAM   44.6917   -68.3475
     USC00178998                  WALTHAM   44.6917   -68.3475
     USC00178998                  WALTHAM   44.6917   -68.3475

Output would be a new column on the location dataframe with the station name that is the closest match

However I am not sure how to loop thru both to accomplish this. Any help would be greatly appreciated..

Thanks, Scott

sokeefe1014
  • 227
  • 1
  • 3
  • 9

2 Answers2

8

Let's say you have a distance function dist that you want to minimize:

def dist(lat1, long1, lat2, long2):
    return np.abs((lat1-lat2)+(long1-long2))

For a given location, you can find the nearest station as follows:

lat = 39.463744
long = -76.119411
weather.apply(
    lambda row: dist(lat, long, row['Latitude'], row['Longitude']), 
    axis=1)

This will calculate the distance to all weather stations. Using idxmin you can find the closest station name:

distances = weather.apply(
    lambda row: dist(lat, long, row['Latitude'], row['Longitude']), 
    axis=1)
weather.loc[distances.idxmin(), 'StationName']

Let's put all this in a function:

def find_station(lat, long):
    distances = weather.apply(
        lambda row: dist(lat, long, row['Latitude'], row['Longitude']), 
        axis=1)
    return weather.loc[distances.idxmin(), 'StationName']

You can now get all the nearest stations by applying it to the locations dataframe:

locations.apply(
    lambda row: find_station(row['Latitude'], row['Longitude']), 
    axis=1)

Output:

0         WALTHAM
1         WALTHAM
2    PORTST.LUCIE
3         WALTHAM
4    PORTST.LUCIE
IanS
  • 15,771
  • 9
  • 60
  • 84
  • 2
    for the minimum distance, between two points lat/lon, should it be `sqrt((x1-x2)^2+(y1-y2)^2)`. This is still considering a plane, more specifically over the sphere, should be some different formula. – CoderBC Apr 25 '16 at 15:09
  • Appreciate the answer! Still finalizing to make sure it all works. I did have to update the dist function to have an np.abs around the latitude calculation and then again around the longitude calculation. Sometimes where latitude was off by a positive amount where longitude was off by a negative amount they offset and gave me something not even close.. Other than that I believe it works perfectly. Would I then just merge the output to the locations dataframe on index? – sokeefe1014 Apr 25 '16 at 15:24
  • @sokeefe1014 the best way to include the result in the original dataframe is probably something like `locations['closest_station'] = locations.apply(lambda row: ..., axis=1)`. – IanS Apr 25 '16 at 15:28
  • Thank you so much! – sokeefe1014 Apr 25 '16 at 15:30
0

So I appreciate that this is a bit messy, but I used something similar to match genetic data between tables. It relies on the location file longitude and latitude being within 5 of those on the weather file, but these can be changed if need be.

rows=range(location.shape[0])
weath_rows = range(weather.shape[0])
for r in rows:
    lat = location.iloc[r,1]
    max_lat = lat +5
    min_lat = lat -5
    lon = location.iloc[r,2]
    max_lon = lon +5
    min_lon = lon -5
    for w in weath_rows:
        if (min_lat <= weather.iloc[w,2] <= max_lat) and (min_lon <= weather.iloc[w,3] <= max_lon):
            location['Station_Name'] = weather.iloc[w,1]
EllieFev
  • 93
  • 7