1

I have a dataframe which has longitude and latitude data of houses

I also have a second dataframe which has longitude and latitude data for supermarkets.

I wrote a script using iterrows to find the closest supermarket but I am pretty sure that it's bad practice and want to know how I can improve it

for index, row in small_housing_df.iterrows():
    property_ll = (row['latitude'], row['longitude'])
    adjusted_supermarket_df['dis_from_house'] = [geopy.distance.geodesic(property_ll, (x, y)).km
                                                 for x, y in zip(adjusted_supermarket_df['lat_wgs'],
                                                 adjusted_supermarket_df['long_wgs'])]
    small_housing_df.at[index, 'closest_store_brand'] = \
        adjusted_supermarket_df.loc[adjusted_supermarket_df['dis_from_house'].idxmin()]['retailer']
    print(str(index) + " of " + str(len(small_housing_df)))

This checks my housing dataset and then for each row works calculates the distance to every supermarket, then creates a column called "closest_store_brand" that returns the store name that had the smallest distance from house.

I'm pretty sure the list comprehension is the fastest method for calculating distance from house, but didn't know how to do the next step quickly.

How can I rewrite this .at() so that I am not updating row wise

Violatic
  • 374
  • 2
  • 18
  • check this answer https://stackoverflow.com/questions/6656475/python-speeding-up-geographic-comparison and this one https://stackoverflow.com/questions/34502254/vectorizing-haversine-distance-calculation-in-python?rq=1 – Atanas Atanasov Jun 12 '22 at 20:58

0 Answers0