-1

I've got a pandas dataframe. I want to 'lag' one of my column conditionally on an other columns. Typically below, i want to lag gdp based on y previous "real" value.

Typically first entry will be 0 because we are looking for y = 0.

Second entry will be 2 because we are looking for y = 1

Third entry will be zero because we are looking for y = 3

etc...

df = 
      y  gdp  cap
  0   1    2    5
  1   2    3    9
  2   4    7    2
  3   5    4    7
  4   6    7    7

df_lag =
    y  gdp  cap  y_prev gdp_lag
0   1    2    5  0      0
1   2    3    9  1      2
2   4    7    2  3      0
3   5    4    7  4      7
4   6    7    7  5      4  

Is there any simple way to do that?

Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
Vincent
  • 1,616
  • 2
  • 16
  • 18

1 Answers1

2

You do not need the shift method, you can simply do a lookup. I make y the index of a simple series, and use .loc to access the values:

df['y_prev'] = df['y'] - 1
df[['y', 'gdp']].set_index('y', drop=True).loc[df['y_prev']]

Output:

   gdp
y     
0  NaN
1  2.0
3  NaN
4  7.0
5  4.0

In order to assign these values to a new column, you need to get rid of the index (with .values):

df['gdp_lag'] = df[['y', 'gdp']].set_index('y', drop=True).loc[df['y_prev']].values
IanS
  • 15,771
  • 9
  • 60
  • 84