-1

So I have a Dataset like this:

     Customer_id   Lat        Lon
 0.     A          40         12
 1.     A          np.nan     np.nan
 2.     A          np.nan     np.nan
 3.     A          43         12
 4.     A          45         13
 5.     B          43         14
 6.     B          np.nan     np.nan
 7.     B          43         16

Where the coordinates (40,12),(43,12),(45,13),(43,14) and (43,16) are the cell towers of a certain network.

Then I apply some interpolation functions and it results in something like the following:

     Customer_id   Lat        Lon
 0.     A          40         12
 1.     A          41         12
 2.     A          42         12
 3.     A          43         12
 4.     A          45         13
 5.     B          43         14
 6.     B          43         15
 7.     B          43         16

But these new coordinates are just estimates and not actual towers. I would like to then assign these estimations to the closest actual tower so that, for example, record 1 would be assigned to the tower (40,12).

I used this code

def haversine_closest_changed(towers, row):
    all_points= towers
    lat2= all_points[:,0] #the actual latitudes of the towers
    lon2= all_points[:,1] #the actual longitudes of the towers
    l=len(lat2) #how many towers are there
    lat1=row['Expected_Lat']  #make a column with the actual latitude my value and all the towers, 
    #the point I'm looking at multiple times
    lon1=row['Expected_Lon']  #find the min distance and output the minimum 

    lat1, lon1, lat2, lon2 = map(np.radians, [lon1, lat1, lon2, lat2])

    dlat = lat2 - lat1
    dlon = lon2 - lon1

    a = np.sin(dlon/2.0)**2 + np.cos(lon1) * np.cos(lon2) * np.sin(dlat/2.0)**2

    c = 2 * np.arcsin(np.sqrt(a))
    km = 6367 * c
    idx=np.argmin(km)
    closest_point=towers[idx,]

    return closest_point

where towers is a pandas dataset with all the towers that exist in the network (one column for Latitude and another for Longitude) and the columns Expected_Lat and Expected_Lon are what I called the columns after I did the interpolation.

This piece of code returns me only 1 value for the latitude and 1 value for the longitude repeated throughout the whole column. How can I change this code to replace only the points that I have interpolated/the points that were previously NaNs by the closest tower?

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Nocas
  • 357
  • 1
  • 4
  • 14

1 Answers1

1

First we mark the rows to be interpolated, then interpolate and finally, with the closest distance calculation from this SO answer we find the nearest actual tower for all interpolated entries:

import pandas as pd
import io
from math import cos, asin, sqrt

s="""     Customer_id   Lat        Lon
 0.     A          40         12
 1.     A          np.nan     np.nan
 2.     A          np.nan     np.nan
 3.     A          43         12
 4.     A          45         13
 5.     B          43         14
 6.     B          np.nan     np.nan
 7.     B          43         16 
"""


df = pd.read_csv(io.StringIO(s), na_values='np.nan', sep='\s\s+', engine='python')
df['Interpolated'] = df.Lat.isnull()
df = df.interpolate()
towers = df.loc[~df.Interpolated,['Lat','Lon']].drop_duplicates().values

def distance(lat1, lon1, lat2, lon2):
    p = 0.017453292519943295
    a = 0.5 - cos((lat2-lat1)*p)/2 + cos(lat1*p)*cos(lat2*p) * (1-cos((lon2-lon1)*p)) / 2
    return 12742 * asin(sqrt(a))

def closest_tower(row):
    return min(towers, key=lambda p: distance(row.Lat, row.Lon, p[0], p[1]))

df.loc[df.Interpolated,['Lat','Lon']] = df.loc[df.Interpolated,['Lat','Lon']].apply(closest_tower ,axis=1)

Result:

    Customer_id   Lat   Lon  Interpolated
0.0           A  40.0  12.0         False
1.0           A  40.0  12.0          True
2.0           A  43.0  12.0          True
3.0           A  43.0  12.0         False
4.0           A  45.0  13.0         False
5.0           B  43.0  14.0         False
6.0           B  43.0  14.0          True
7.0           B  43.0  16.0         False
Stef
  • 28,728
  • 2
  • 24
  • 52
  • Thank you for your answer! I just seem to keep getting an error with this. I used as df the. full dataframe I have and as towers the full dataframe of the towers with column names 'Lat' and 'Lon' just like you have in your answer. But for me it returns TypeError: ("ufunc 'subtract' did not contain a loop with signature matching types dtype(' – Nocas Sep 01 '19 at 09:20
  • without some context it is difficult to say what the issue is with your data. Are you sure that your data are numeric and not strings or anyting else (check with `df.info()`)? – Stef Sep 01 '19 at 14:19