1

I was wondering if there is a one-line (or two!) solution instead of using multiple for-loops for the following problem:

I want to calculate the difference between any $weight value if they are for the same $ID, $orient, and $direct:

$ID $spec   $view   $orient $direct $weight
9247    1   post    stance      0       2038.66 
9247    2   post    stance      15b     2177.74 
9247    4   post    stance      15f     1559.62 
9247    5   ant     stance      15b     2271.89     
9247    6   ant     stance      0       2075.44     
9247    7   ant     stance      15f     1438.31     
9247    8   post    fall        15a     1665.60     
9247    9   post    fall        15p     1742.82     
9119    1   ant     fall        0       994.48      
9119    2   ant     fall        15b     1081.44     
9119    3   post    fall        15b     1024.18 
9119    4   post    fall        0       1093.46 
9119    5   post    stance      15a     1220.13     
9119    6   post    stance      15p     1089.72     
9119    7   post    fall        15f     1056.21     

For example, the difference between the first and fifth line should be calculated (2038.66 - 2075.44 = −36.78) and so on, and written in such a new dataframe:

$ID $orient $direct $weight-difference
9247    stance      0       −36.78  
9247    stance      15b     −94.15

Thanks!

Yuca
  • 6,010
  • 3
  • 22
  • 42
lvl0l27
  • 47
  • 6

1 Answers1

1

Use series.diff with groupby:

df.assign(difference=df.groupby(['$ID','$orient','$direct'])['$weight'].diff().mul(-1))
                                                                      .dropna()

     $ID  $spec $view $orient $direct  $weight  difference
3   9247      5   ant  stance     15b  2271.89      -94.15
4   9247      6   ant  stance       0  2075.44      -36.78
5   9247      7   ant  stance     15f  1438.31      121.31
10  9119      3  post    fall     15b  1024.18       57.26
11  9119      4  post    fall       0  1093.46      -98.98
anky
  • 74,114
  • 11
  • 41
  • 70
  • May I also ask how the two values of each pair can be written in separate columns (instead of calculating their difference)? One column the values with $view==ant and the other the $view==post – lvl0l27 Jan 17 '20 at 16:43
  • @Silent That would require a pivot but thats out of scope for the currebnt question: Check [this](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe) for different methods of pivot (for your usecase i think Q10) – anky Jan 17 '20 at 16:46