0

df:

    Time Name  X  Y
0   00   AA    0  0
1   30   BB    1  1
2   45   CC    2  2
3   60   GG:AB 3  3
4   90   GG:AC 4  4
5   120  AA    5  3

dataGroup = df.groupby

([pd.Grouper(key=Time,freq='30s'),'Name'])).sort_values(by=['Timestamp'],ascending=True)

I have tried doing a diff() on the row, but it is returning NaN or something not expected.

df.groupby('Name', sort=False)['X'].diff()

How do I keep the groupings and the time sort, and do diff between a row and its previous row (for both the X and the Y column)

Expected output: XDiff would be Group AA, XDiff row 1 = (X row1 - origin (known)) XDiff row 2 = (X row2 - X row1)

    Time Name  X  Y XDiff  YDiff
0   00   AA    0  0  0       0
5   120  AA    5  3  5       3
1   30   BB    1  1  0       0
6   55   BB    2  3  1       2
2   45   CC    2  2  0       0
3   60   GG:AB 3  3  0       0
4   90   GG:AC 4  4  0       0

It would be nice to see the total distance for each group (ie, AA is 5, BB is 1) In my example, I only have a couple of rows for each group, but what if there were 100 of them, the diff would give me values for the distance between any two, but not the total distance for that group.

wegunterjr
  • 141
  • 2
  • 9
  • Can you post the expected output? – harvpan Nov 16 '18 at 16:07
  • related / possible duplicate: https://stackoverflow.com/questions/20648346/computing-diffs-within-groups-of-a-dataframe – Evan Nov 16 '18 at 16:41
  • Can you clarify what you mean by "total distance"? – Evan Nov 16 '18 at 16:53
  • Possible duplicate of [Computing diffs within groups of a dataframe](https://stackoverflow.com/questions/20648346/computing-diffs-within-groups-of-a-dataframe) – Evan Nov 16 '18 at 16:53

1 Answers1

0

Ripping off https://stackoverflow.com/a/20664760/6672746, you can use a lambda function to calculate the difference between rows for X and Y. I also included two lines to set the index (after the groupby) and sort it.

df['x_diff'] = df.groupby(['Name'])['X'].transform(lambda x: x.diff()).fillna(0)
df['y_diff'] = df.groupby(['Name'])['Y'].transform(lambda x: x.diff()).fillna(0)
df.set_index(["Name", "Time"], inplace=True)
df.sort_index(level=["Name", "Time"], inplace=True)

Output:

            X  Y  x_diff  y_diff
Name  Time                      
AA    0     0  0     0.0     0.0
      120   5  3     5.0     3.0
BB    30    1  1     0.0     0.0
CC    45    2  2     0.0     0.0
GG:AB 60    3  3     0.0     0.0
GG:AC 90    4  4     0.0     0.0
Evan
  • 2,121
  • 14
  • 27