23

I want to lag every column in a dataframe, by group. I have a frame like this:

import numpy as np
import pandas as pd

index = pd.date_range('2015-11-20', periods=6, freq='D')

df = pd.DataFrame(dict(time=index, grp=['A']*3 + ['B']*3, col1=[1,2,3]*2,
    col2=['a','b','c']*2)).set_index(['time','grp'])

which looks like

                col1 col2
time       grp           
2015-11-20 A       1    a
2015-11-21 A       2    b
2015-11-22 A       3    c
2015-11-23 B       1    a
2015-11-24 B       2    b
2015-11-25 B       3    c

and I want it to look like this:

                col1 col2 col1_lag col2_lag
time       grp                     
2015-11-20 A       1    a        2        b
2015-11-21 A       2    b        3        c
2015-11-22 A       3    c       NA       NA
2015-11-23 B       1    a        2        b
2015-11-24 B       2    b        3        c
2015-11-25 B       3    c       NA       NA

This question manages the result for a single column, but I have an arbitrary number of columns, and I want to lag all of them. I can use groupby and apply, but apply runs the shift function over each column independently, and it doesn't seem to like receiving an [nrow, 2] shaped dataframe in return. Is there perhaps a function like apply that acts on the whole group sub-frame? Or is there a better way to do this?

Community
  • 1
  • 1
naught101
  • 18,687
  • 19
  • 90
  • 138

2 Answers2

29

IIUC, you can simply use level="grp" and then shift by -1:

>>> shifted = df.groupby(level="grp").shift(-1)
>>> df.join(shifted.rename(columns=lambda x: x+"_lag"))
                col1 col2  col1_lag col2_lag
time       grp                              
2015-11-20 A       1    a         2        b
2015-11-21 A       2    b         3        c
2015-11-22 A       3    c       NaN      NaN
2015-11-23 B       1    a         2        b
2015-11-24 B       2    b         3        c
2015-11-25 B       3    c       NaN      NaN
DSM
  • 342,061
  • 65
  • 592
  • 494
  • Great, thanks, I can't remember why I thought I needed to do it with `apply` - maybe it'll come to me later. – naught101 Nov 25 '15 at 22:56
  • 1
    Hrm.. I just realised that the reason I was trying it a different way is because the format `.shift(periods=1, freq='1D')` doesn't work with `.groupby` - I actually already [reported this as a bug](https://github.com/pydata/pandas/issues/11452) - must have been having a bad week last week. It's since been changed in pandas so that it gives a not-implemented error. Hrm. – naught101 Nov 30 '15 at 02:30
  • 2
    Any thoughts on how one could take this exact same approach but grouping by two columns instead of just the one? – Ricky Aug 26 '21 at 16:16
0

Insufficient reputation to reply to Ricky's question in a comment, but you can simply add additional fields to groupby in a list, like so:

>>> shifted = df.groupby(["first_col", "second_col"]).shift(-1)

and then proceed with any subsequent steps as normal. Keep in mind the resulting df will be a grouped object with levels, so if you want to remove these afterwards you will need to call

>>> df_new = df_grouped.reset_index()

at the end. Hope this helps.