How to group and calculate distance?

Question

I have a data set as one below:

ID | X | Y | Z
--------------
 1 | 5 | 5 | 5
 1 | 4 | 2 | 0
 2 | 1 | 3 | 4
 .
 .
 .

I have ground truth values (x,y,z) for each ID. I want to calculate distance using the true values for each ID in the table above. I tried using df.groupby() but not sure how to stick the df back together.

True values:

ID | X | Y | Z
---------------
 1 | 1 | 2 | 3
 2 | 4 | 5 | 6
 3 | 7 | 8 | 9
 .
 .

I expect the output to look like:

ID | X  | Y  | Z
-----------------
 1 |  4 |  3 |  2
 1 |  3 |  0 | -3
 2 | -3 | -2 | -2
 .
 .
 .

Also, what do you mean by `I have true (x,y,z) for each ID`? How do you calculate `True`? — Mayank Porwal, Apr 23 '20 at 23:34
@MayankPorwal if I understand correctly, he has a table of ground truths then a table of 'observations'. The ID column is used to identify which ground truth values to use. — Riley, Apr 23 '20 at 23:38
Yes, @Riley got it right. Sorry for the ambiguity. I want to calculate euclidean distance basically. I have observations and they have IDs to true locations. — MichaelMMeskhi, Apr 23 '20 at 23:46
Try: https://stackoverflow.com/questions/45227930/subtraction-of-pandas-dataframes. df.sub(df2.iloc[:,0],axis=0) — Merlin, Apr 23 '20 at 23:49
What is the issue, exactly? Have you tried anything, done any research? Please see [ask], [help/on-topic]. — AMC, Apr 24 '20 at 03:12

Quang Hoang · Accepted Answer · 2020-04-24T00:09:24.037

2

You can set ID as index and subtract. By doing so, pandas will align the correct ID (in this case, index) for you:

df.set_index('ID').sub(ground_truths.set_index('ID')).reset_index()

Output:

   ID    X    Y    Z
0   1  4.0  3.0  2.0
1   1  3.0  0.0 -3.0
2   2 -3.0 -2.0 -2.0
3   3  NaN  NaN  NaN

Update: for Euclidean:

tmp = df.set_index('ID').sub(ground_truths.set_index('ID'))

# this is Euclidean part:
# you can use other packages, e.g. np.norm
result = ((tmp**2).sum(axis=1))**0.5
result = result.reset_index()

edited Apr 24 '20 at 00:09

answered Apr 23 '20 at 23:50

Quang Hoang

146,074
10
56
74

The true distance values are have unique IDs, say up to 30. But the values I have collected that have repeated IDs with different x,y,z are more than 30. – MichaelMMeskhi Apr 24 '20 at 00:02
1

@MichaelMMeskhi yes, this code should work. The fact that `ground_truths` having unique IDs is important. – Quang Hoang Apr 24 '20 at 00:03
i just tried it and it seems to work I believe. But as I mentioned, I am trying to find euclidean distance, so how would I pass these to a function or.. – MichaelMMeskhi Apr 24 '20 at 00:05
Awesome! Thank you. – MichaelMMeskhi Apr 24 '20 at 00:14

How to group and calculate distance?

1 Answers1