2

I have the following pandas dataframe:

import pandas as pd
import math

df = pd.DataFrame()
df['x'] = [2, 1, 3]
df['y'] = [2, 5, 6]
df['weight'] = [11, 12, 13]
print(df)

     x    y   weight   
 0   2    2       11       
 1   1    5       12       
 2   3    6       13       

Suppose that these 3 nodes are called {a, b, c} respectively. I want to calculate the total Euclidean distances from one node to all other nodes multiplied by its weight, as follows:

Sum = 11(d(a,b)+d(a,c)) + 12(d(b,a)+d(b,c)) + 13(d(c,a)+d(c,b))
arizamoona
  • 45
  • 7

1 Answers1

2

Use SciPy's cdist -

In [72]: from scipy.spatial.distance import cdist

In [73]: a = df[['x','y']].values

In [74]: w = df.weight.values

In [100]: cdist(a,a).sum(1) * w
Out[100]: array([ 80.13921614,  64.78014765,  82.66925684])

We can also use a combination of pdist and squareform from the same SciPy method to replace cdist there.

Verify with those actual values -

In [76]: from scipy.spatial.distance import euclidean

In [77]: euclidean([2,2],[1,5])*11 + euclidean([2,2],[3,6])*11
Out[77]: 80.139216143646451

In [78]: euclidean([1,5],[2,2])*12 + euclidean([1,5],[3,6])*12
Out[78]: 64.78014765201803

In [80]: euclidean([3,6],[2,2])*13 + euclidean([3,6],[1,5])*13
Out[80]: 82.669256840526856
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Thank you so much. This really helps me :) Anyway, will this program work if the number of nodes is 1000? – arizamoona Nov 07 '17 at 11:51
  • @arizamoona Depends on system RAM. But that should be okay on decent sized RAMs. With my 16 GB setup I can run `10000` nodes, just to give you an estimate. – Divakar Nov 07 '17 at 11:55
  • Oh so you means this calculation will take much time to run if the number of node is 1000 or over? In your opinion, how many seconds/minutes will this calculation take for 1000 nodes? Just approximately is ok. – arizamoona Nov 07 '17 at 11:58
  • @arizamoona Again depends on the setup. Why not try it at your end? The timing should be proportional to the dataset size. – Divakar Nov 07 '17 at 11:59
  • I see. Thank you again. You're so kind and helpful :) – arizamoona Nov 07 '17 at 12:04
  • Oh I'm sorry, i confuse my hand to delete your last command. Anyway, I just upvoted your answer, not sure it is right or not. Now, I used "start_time = time.time()" to find out time it took for a python program to complete execution. Anyway, seems like this does not work. When I run, it always show 0 seconds. Do you know how to deal with this? Sorry to bother you again. – arizamoona Nov 07 '17 at 12:30
  • @arizamoona No worries. Thanks for upvoting! On timing, try something like this - https://stackoverflow.com/a/2866456/? – Divakar Nov 07 '17 at 12:34
  • Seems like that one will show the result in seconds. I want to make it show the running time in both minute and second. For example, if it took 10s, it will show 0mn 10s. And if it took 1mn 10s, it will show 1mn 10s. – arizamoona Nov 07 '17 at 12:54
  • @arizamoona It should with float precision. So, even if its less than 1 sec, should show in fraction.So, 0.5 would mean 0.5 sec. Alternatively, convert to function and time with `%timeit` on IPython console/shell - https://stackoverflow.com/a/8220961/. – Divakar Nov 07 '17 at 13:05