I have a data frame df
and I want to create multiple lags of column A
.
I should be able to use the .assign()
method and a dictionary comprehension, I think.
However, all lags are the longest lag with my solution below, even though the dictionary comprehension itself creates the correct lags.
Also, can someone explain why I need the **
just before my dictionary comprehension?
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': np.arange(5)})
df.assign(**{'lag_' + str(i): lambda x: x['A'].shift(i) for i in range(1, 5+1)})
A lag_1 lag_2 lag_3 lag_4 lag_5
0 0 NaN NaN NaN NaN NaN
1 1 NaN NaN NaN NaN NaN
2 2 NaN NaN NaN NaN NaN
3 3 NaN NaN NaN NaN NaN
4 4 NaN NaN NaN NaN NaN
The dictionary comprehension itself creates the correct lags.
{'lag_' + str(i): df['A'].shift(i) for i in range(1, 5+1)}
{'lag_1': 0 NaN
1 0.0
2 1.0
3 2.0
4 3.0
Name: A, dtype: float64,
'lag_2': 0 NaN
1 NaN
2 0.0
3 1.0
4 2.0
Name: A, dtype: float64,
'lag_3': 0 NaN
1 NaN
2 NaN
3 0.0
4 1.0
Name: A, dtype: float64,
'lag_4': 0 NaN
1 NaN
2 NaN
3 NaN
4 0.0
Name: A, dtype: float64,
'lag_5': 0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
Name: A, dtype: float64}