0

I have a data set as one below:

ID | X | Y | Z
--------------
 1 | 5 | 5 | 5
 1 | 4 | 2 | 0
 2 | 1 | 3 | 4
 .
 .
 .

I have ground truth values (x,y,z) for each ID. I want to calculate distance using the true values for each ID in the table above. I tried using df.groupby() but not sure how to stick the df back together.

True values:

ID | X | Y | Z
---------------
 1 | 1 | 2 | 3
 2 | 4 | 5 | 6
 3 | 7 | 8 | 9
 .
 .

I expect the output to look like:

ID | X  | Y  | Z
-----------------
 1 |  4 |  3 |  2
 1 |  3 |  0 | -3
 2 | -3 | -2 | -2
 .
 .
 .
Riley
  • 4,122
  • 3
  • 16
  • 30
MichaelMMeskhi
  • 659
  • 8
  • 26
  • 1
    What is the expected output based on sample input? – Mayank Porwal Apr 23 '20 at 23:31
  • 1
    Also, what do you mean by `I have true (x,y,z) for each ID`? How do you calculate `True`? – Mayank Porwal Apr 23 '20 at 23:34
  • 1
    @MayankPorwal if I understand correctly, he has a table of ground truths then a table of 'observations'. The ID column is used to identify which ground truth values to use. – Riley Apr 23 '20 at 23:38
  • Yes, @Riley got it right. Sorry for the ambiguity. I want to calculate euclidean distance basically. I have observations and they have IDs to true locations. – MichaelMMeskhi Apr 23 '20 at 23:46
  • Try: https://stackoverflow.com/questions/45227930/subtraction-of-pandas-dataframes. df.sub(df2.iloc[:,0],axis=0) – Merlin Apr 23 '20 at 23:49
  • What is the issue, exactly? Have you tried anything, done any research? Please see [ask], [help/on-topic]. – AMC Apr 24 '20 at 03:12

1 Answers1

2

You can set ID as index and subtract. By doing so, pandas will align the correct ID (in this case, index) for you:

df.set_index('ID').sub(ground_truths.set_index('ID')).reset_index()

Output:

   ID    X    Y    Z
0   1  4.0  3.0  2.0
1   1  3.0  0.0 -3.0
2   2 -3.0 -2.0 -2.0
3   3  NaN  NaN  NaN

Update: for Euclidean:

tmp = df.set_index('ID').sub(ground_truths.set_index('ID'))

# this is Euclidean part:
# you can use other packages, e.g. np.norm
result = ((tmp**2).sum(axis=1))**0.5
result = result.reset_index()
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74