I have attached a sample of my dataset. I have minimal Panda experience, hence, I'm struggling to formulate the problem.
What I'm trying to do is populate the 'dist' column (cartesian: p1 = (lat1,long1) ; p2 = (lat2,long2)
) for each index based on the state and the county.
Each county may have multiple p1
's. We use the one nearest to p2
when computing the distance. When a county doesn't have a p1
value, we simply use the next one that comes in the sequence.
How do I set up this problem concisely? I can imagine running an iterator over the the county/state but failing to move beyond that.
[EDIT] Here is the data frame head as suggested below. (Ignore the mismatch from the picture)
lat1 long1 state county lat2 long2
0 . . AK Aleutians West 11.0 23.0
1 . . AK Wade Hampton 33.0 11.0
2 . . AK North Slope 55.0 11.0
3 . . AK Kenai Peninsula 44.0 11.0
4 . . AK Anchorage 11.0 11.0
5 1 2 AK Anchorage NaN NaN
6 . . AK Anchorage 55.0 44.0
7 3 4 AK Anchorage NaN NaN
8 . . AK Anchorage 3.0 2.0
9 . . AK Anchorage 5.0 11.0
10 . . AK Anchorage 42.0 22.0
11 . . AK Anchorage 11.0 2.0
12 . . AK Anchorage 444.0 1.0
13 . . AK Anchorage 1.0 2.0
14 0 2 AK Anchorage NaN NaN
15 . . AK Anchorage 1.0 1.0
16 . . AK Anchorage 111.0 11.0