1
df = pd.DataFrame(
    {'ts':[1,2,3,4,60,61,62,63,64,150,155,156,
           1,2,3,4,60,61,62,63,64,150,155,156,
           1,2,3,4,60,61,62,63,64,150,155,156],
    'id': [1,2,3,4,60,61,62,63,64,150,155,156,
           71,72,73,74,80,81,82,83,64,160,165,166,
           21,22,23,24,90,91,92,93,94,180,185,186],
    'other':['x','x','x','x','x','x','x','x','x','x','x','x',
             'y','y','y','y','y','y','y','y','y','y','y','y',
             'z','z','z','z','z','z','z','z','z','z','z','z'],
    'user':['x','x','x','x','y','x','x','x','x','x','x','x',
            'y','y','y','y','x','y','y','y','y','y','y','y',
            'z','z','z','z','z','z','z','z','z','z','z','z']
    })


df.set_index('id', inplace=True)
df.sort_values('ts',inplace=True)


for x, g in df.groupby('user'):
    # call 1
    print(g.ts.diff())

# call 2
df.groupby('user').ts.diff()

I'm not sure why I'm getting an error in call 2. Also I noticed that when I remove the sort_values the call 2 passes.

Can somebody please explain this behavior?

mkmostafa
  • 3,071
  • 2
  • 18
  • 47
  • Neither the answer here nor the linked answer (which was asked many years ago regarding a much earlier pandas version) gives explanation on why the `groupby.diff()` method fails. I'm still getting this error on pandas 0.24.1 – jf328 Mar 15 '19 at 11:01

1 Answers1

0

I get the error regardless of whether the sort is called or not. In any case, I think what you're looking for is something like:

df['group_diff'] = df.ts.groupby(df.user).transform(pd.Series.diff)
>>> df.head()
    other   ts  user    group_diff
id              
1   x   1   x   NaN
2   x   2   x   1.0
3   x   3   x   1.0
4   x   4   x   1.0
60  x   60  y   Nan

Following the groupby, you perform a transform, that creates an entry per entry within each group using some function. This function is just pd.Series.diff.

Note how you have a Nan on rows 0 and 4 - they correspond to the beginning of the x and y groups, respectively.

Ami Tavory
  • 74,578
  • 11
  • 141
  • 185