Use .assign() and a dictionary comprehension to create multiple lags of one column

Question

I have a data frame df and I want to create multiple lags of column A. I should be able to use the .assign() method and a dictionary comprehension, I think. However, all lags are the longest lag with my solution below, even though the dictionary comprehension itself creates the correct lags. Also, can someone explain why I need the ** just before my dictionary comprehension?

import numpy as np
import pandas as pd

df = pd.DataFrame({'A': np.arange(5)})

df.assign(**{'lag_' + str(i): lambda x: x['A'].shift(i) for i in range(1, 5+1)})

    A   lag_1   lag_2   lag_3   lag_4   lag_5
0   0   NaN     NaN     NaN     NaN     NaN
1   1   NaN     NaN     NaN     NaN     NaN
2   2   NaN     NaN     NaN     NaN     NaN
3   3   NaN     NaN     NaN     NaN     NaN
4   4   NaN     NaN     NaN     NaN     NaN

The dictionary comprehension itself creates the correct lags.

{'lag_' + str(i): df['A'].shift(i) for i in range(1, 5+1)}

{'lag_1': 0    NaN
 1    0.0
 2    1.0
 3    2.0
 4    3.0
 Name: A, dtype: float64,
 'lag_2': 0    NaN
 1    NaN
 2    0.0
 3    1.0
 4    2.0
 Name: A, dtype: float64,
 'lag_3': 0    NaN
 1    NaN
 2    NaN
 3    0.0
 4    1.0
 Name: A, dtype: float64,
 'lag_4': 0    NaN
 1    NaN
 2    NaN
 3    NaN
 4    0.0
 Name: A, dtype: float64,
 'lag_5': 0   NaN
 1   NaN
 2   NaN
 3   NaN
 4   NaN
 Name: A, dtype: float64}

Why are you using `lambda`? Note, in this case, `lambda x: x['A'].shift(i)` creates a function that has a free variable, `i`. Python uses lexically scoped closures. That means `i` will refer to whatever value `i` has when you use the lambda, in this case, it wil **always refer to the `i` inside the scope of your list comprehension**, so it will always end up using the *last* `i` in the loop... but it doesn't seem like `lambda` serves any purpose here... It just adds an extra layer of indirection, and in this case, introduces a bug — juanpa.arrivillaga, Apr 13 '21 at 21:57

score 2 · Answer 1 · answered Apr 13 '21 at 21:54

2

Just pass what you did for the dict remove lambda

out = df.assign(**{'lag_' + str(i):  df['A'].shift(i) for i in range(1, 5+1)})
Out[65]: 
   A  lag_1  lag_2  lag_3  lag_4  lag_5
0  0    NaN    NaN    NaN    NaN    NaN
1  1    0.0    NaN    NaN    NaN    NaN
2  2    1.0    0.0    NaN    NaN    NaN
3  3    2.0    1.0    0.0    NaN    NaN
4  4    3.0    2.0    1.0    0.0    NaN

answered Apr 13 '21 at 21:54

BENY

317,841
20
164
234

Do you have intuition on why my code fails? And why `**` is required before my dictionary comprehension? – Richard Herron Apr 13 '21 at 21:57
1

@RichardHerron because `assign` requires keyword arguments, it doesn't take a dictionary as an argument. When you use `**` on a mapping as a function call, it applies the key-value pairs as keyword arguments – juanpa.arrivillaga Apr 13 '21 at 21:58
@juanpa.arrivillaga Thanks! And for anyone looking for the intuition of why code fails, see juanpa's comment on my question. – Richard Herron Apr 13 '21 at 22:00
1

@RichardHerron see the linked duplicate, it shows how you could get this to work using a function, but you don't need a function in this case, but it explains what is going on in detail – juanpa.arrivillaga Apr 13 '21 at 22:01
with lambda: `df.assign(**{'lag_' + str(i): lambda df_, i=i: df_['A'].shift(i) for i in range(1, 5+1)})` | useful to use in chain, because df_ will get the last state of dataframe in pipeline – the_RR Dec 11 '22 at 19:41

Use .assign() and a dictionary comprehension to create multiple lags of one column

1 Answers1