group data using pandas, but how do I keep the order of the group and do math on two of the columns rows?

Question

df:

    Time Name  X  Y
0   00   AA    0  0
1   30   BB    1  1
2   45   CC    2  2
3   60   GG:AB 3  3
4   90   GG:AC 4  4
5   120  AA    5  3

dataGroup = df.groupby

([pd.Grouper(key=Time,freq='30s'),'Name'])).sort_values(by=['Timestamp'],ascending=True)

I have tried doing a diff() on the row, but it is returning NaN or something not expected.

df.groupby('Name', sort=False)['X'].diff()

How do I keep the groupings and the time sort, and do diff between a row and its previous row (for both the X and the Y column)

Expected output: XDiff would be Group AA, XDiff row 1 = (X row1 - origin (known)) XDiff row 2 = (X row2 - X row1)

    Time Name  X  Y XDiff  YDiff
0   00   AA    0  0  0       0
5   120  AA    5  3  5       3
1   30   BB    1  1  0       0
6   55   BB    2  3  1       2
2   45   CC    2  2  0       0
3   60   GG:AB 3  3  0       0
4   90   GG:AC 4  4  0       0

It would be nice to see the total distance for each group (ie, AA is 5, BB is 1) In my example, I only have a couple of rows for each group, but what if there were 100 of them, the diff would give me values for the distance between any two, but not the total distance for that group.

related / possible duplicate: https://stackoverflow.com/questions/20648346/computing-diffs-within-groups-of-a-dataframe — Evan, Nov 16 '18 at 16:41
Possible duplicate of [Computing diffs within groups of a dataframe](https://stackoverflow.com/questions/20648346/computing-diffs-within-groups-of-a-dataframe) — Evan, Nov 16 '18 at 16:53

Evan · Answer 1 · 2018-11-16T16:49:53.247

Ripping off https://stackoverflow.com/a/20664760/6672746, you can use a lambda function to calculate the difference between rows for X and Y. I also included two lines to set the index (after the groupby) and sort it.

df['x_diff'] = df.groupby(['Name'])['X'].transform(lambda x: x.diff()).fillna(0)
df['y_diff'] = df.groupby(['Name'])['Y'].transform(lambda x: x.diff()).fillna(0)
df.set_index(["Name", "Time"], inplace=True)
df.sort_index(level=["Name", "Time"], inplace=True)

Output:

            X  Y  x_diff  y_diff
Name  Time                      
AA    0     0  0     0.0     0.0
      120   5  3     5.0     3.0
BB    30    1  1     0.0     0.0
CC    45    2  2     0.0     0.0
GG:AB 60    3  3     0.0     0.0
GG:AC 90    4  4     0.0     0.0

group data using pandas, but how do I keep the order of the group and do math on two of the columns rows?

1 Answers1