I have two data frames of different size of A(236 x 4) and B(967 x 4). I want to iterate through rows of both data frames with a nested loop that takes the input of both FIPS columns and if they're equal I want them to spit out a haversine distance.
Header of one data frame looks like this,
AddressClinic CountyClinicFIPS Lat Lon\
0 9137 ST.RT. 136, WEST UNION, OH, 45693 1.0 -83.60553 38.809795
1 940 N. Cable Rd., Lima, OH, 45805 3.0 -84.14703 40.755620
2 934 Center St. Ste E, Ashland, OH, 44805 5.0 -82.31091 40.859657
3 934 Center ST Ste. E, Ashland, OH, 44805 5.0 -82.31091 40.859657
4 3225 Lake Avenue, Ashtabula, OH, 44004 7.0 -80.79042 41.876133
The other data frame is the exact same format with different address information in the same format.
InputAddr CountyRetailerFIPS Lat Lon\
0 16782 ST RT 125, WEST UNION, OH, 45693 1 -83.41653 38.771553
1 156 NORTH MAIN ST, PEEBLES, OH, 45660 1 -83.40529 38.949960
2 18811 SR 136 , WINCHESTER, OH, 45697 1 -83.65418 38.937350
3 2100 HARDING HIGHWAY #12, LIMA, OH, 45804 3 -84.06680 40.730656
4 1102 ELIDA AVE, DELPHOS, OH, 45833 3 -84.32622 40.842110
Code I've been attempting is an almagmation that looks like this
import pandas as pd
import numpy as np
from numpy import cos, sin, arcsin, sqrt
from math import radians
dataclinic=pd.read_csv(r"C:\Users\Jack\Documents\Schoolwork\WICResearch\MyDocs\getdistanceclinic.csv")
dataretailer=pd.read_csv(r"C:\Users\Jack\Documents\Schoolwork\WICResearch\MyDocs\getdistanceretailer.csv")
datadistance = pd.DataFrame([])
def haversine(row):
lon1 = row1['Lon']
lat1 = row1['Lat']
lon2 = row2['Lon']
lat2 = row2['Lat']
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * arcsin(sqrt(a))
km = 6367 * c
return km
for index1, row1 in dataclinic.iterrows():
for index2, row2 in dataretailer.iterrows():
if row1['CountyClinicFIPS'] == row2['CountyRetailerFIPS']:
datadistance = datadistance.apply(lambda row: haversine(row), axis=1)
else:
continue
print(datadistance)
And unsurprisingly it doesn't work. It does something but not much. It returns datadistance as a series object of (0,) index1,2 as int 235,966 row1,2 as Series of values (4,) both which are the elements of the first row of the data frames.
The expected outcome is to have datadistance be filled with all the cases where ClinicFips = RetailerFips and the distance between the two computed and listed to make averaging between counties easier.
Perhaps I am going about this the wrong way. Any advice is greatly appreciated. Thanks.