1

I have two dataframes: 1) Contains a list of suppliers and their Lat,Long coordinates

sup_essential = pd.DataFrame({'supplier': ['A','B','C'],
                              'coords': [(51.1235,-0.3453),(52.1245,-0.3423),(53.1235,-1.4553)]})

2) A list of stores and their lat, long coordinates

stores_essential = pd.DataFrame({'storekey': [1,2,3],
                              'coords': [(54.1235,-0.6553),(49.1245,-1.3423),(50.1235,-1.8553)]})

I want to create an output table that has: store, store_coordinates, supplier, supplier_coordinates, distance for every combination of store and supplier.

I currently have:

test=[]
for row in sup_essential.iterrows():
    for row in stores_essential.iterrows():
        r = sup_essential['supplier'],stores_essential['storeKey']
        test.append(r)

But this just gives me repeats of all the values

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
PaddyD15
  • 13
  • 3
  • Please provide small (3-7 rows) reproducible data sets in text/CSV format and desired data set. Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – MaxU - stand with Ukraine Apr 16 '17 at 09:17
  • @MaxU data itself is confidential and given it is coordinates it would be quite easy to identify. However, the headers are is: For stores: storeKey (int) locationLongitude locationLatitude coords (lat, long) For suppliers: supplier (varchar) latitude longitude coords (lat, long) – PaddyD15 Apr 16 '17 at 09:31
  • You don't need to specify the real data. Just [post](http://stackoverflow.com/posts/43435657/edit) sample (fake) data sets in your question – MaxU - stand with Ukraine Apr 16 '17 at 09:33
  • Thanks @MaxU - have added in now – PaddyD15 Apr 16 '17 at 09:46
  • your code isn't working. Beside that - do you really have coordinates as strings in your real data sets? – MaxU - stand with Ukraine Apr 16 '17 at 09:49
  • @MaxU input error - try now – PaddyD15 Apr 16 '17 at 09:56

1 Answers1

0

Source DFs

In [105]: sup
Out[105]:
               coords supplier
0  (51.1235, -0.3453)        A
1  (52.1245, -0.3423)        B
2  (53.1235, -1.4553)        C

In [106]: stores
Out[106]:
               coords  storekey
0  (54.1235, -0.6553)         1
1  (49.1245, -1.3423)         2
2  (50.1235, -1.8553)         3

Solutions:

from sklearn.neighbors import DistanceMetric
dist = DistanceMetric.get_metric('haversine')

m = pd.merge(sup.assign(x=0), stores.assign(x=0), on='x', suffixes=['1','2']).drop('x',1)

d1 = sup[['coords']].assign(lat=sup.coords.str[0], lon=sup.coords.str[1]).drop('coords',1)
d2 = stores[['coords']].assign(lat=stores.coords.str[0], lon=stores.coords.str[1]).drop('coords',1)

m['dist_km'] = np.ravel(dist.pairwise(np.radians(d1), np.radians(d2)) * 6367)
## -- End pasted text --

Result:

In [135]: m
Out[135]:
              coords1 supplier             coords2  storekey     dist_km
0  (51.1235, -0.3453)        A  (54.1235, -0.6553)         1  334.029670
1  (51.1235, -0.3453)        A  (49.1245, -1.3423)         2  233.213416
2  (51.1235, -0.3453)        A  (50.1235, -1.8553)         3  153.880680
3  (52.1245, -0.3423)        B  (54.1235, -0.6553)         1  223.116901
4  (52.1245, -0.3423)        B  (49.1245, -1.3423)         2  340.738587
5  (52.1245, -0.3423)        B  (50.1235, -1.8553)         3  246.116984
6  (53.1235, -1.4553)        C  (54.1235, -0.6553)         1  122.997130
7  (53.1235, -1.4553)        C  (49.1245, -1.3423)         2  444.459052
8  (53.1235, -1.4553)        C  (50.1235, -1.8553)         3  334.514028
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419