Pandas Shift: Looking for better alternative

Question

import pandas as pd

df = pd.DataFrame(np.array([[1, 0, 0], [4, 5, 0], [7, 7, 7], [7, 4, 5], [4, 5, 0], [7, 8, 9], [3, 2, 9], [9, 3, 6], [6, 8, 5]]), 
              columns=['a', 'b', 'c'], 
              index = ['1/1/2000', '1/1/2001', '1/1/2002', '1/1/2003', '1/1/2004', '1/1/2005', '1/1/2006', '1/1/2007', '1/1/2008'])

df['a_1'] = df['a'].shift(1)
df['a_3'] = df['a'].shift(3)
df['a_5'] = df['a'].shift(5)
df['a_7'] = df['a'].shift(7)

Above is a dummy example of how I am shifting. Issues: 1. Need extra line for different period of shift, can it be done in one go? 2. Above df is small, in case of massive dataframe this operation is slow. I checked different questions: most are relating it to shift not being cython optimized, is there a faster way (apart from numba which few answer do talk about)

TL;DR -- `pd.concat([df[["a"]].shift(i).add_suffix(f"_{i}") for i in range(1, 2*n, 2)], axis=1)` is how you can do it "in one go" for `n` number of odd-spaced shifts. In terms of speed, you might have issues finding something faster. It would help if you could explain what you are defining as "slow" here. — ddejohn, Nov 29 '22 at 06:01

score 0 · Answer 1 · answered Nov 29 '22 at 06:05

nums = [1, 3, 5, 7]
pd.concat([df] + [df['a'].shift(i).to_frame(f'a_{i}') for i in nums], axis=1)

result:

            a   b   c   a_1 a_3 a_5 a_7
1/1/2000    1   0   0   NaN NaN NaN NaN
1/1/2001    4   5   0   1.0 NaN NaN NaN
1/1/2002    7   7   7   4.0 NaN NaN NaN
1/1/2003    7   4   5   7.0 1.0 NaN NaN
1/1/2004    4   5   0   7.0 4.0 NaN NaN
1/1/2005    7   8   9   4.0 7.0 1.0 NaN
1/1/2006    3   2   9   7.0 7.0 4.0 NaN
1/1/2007    9   3   6   3.0 4.0 7.0 1.0
1/1/2008    6   8   5   9.0 7.0 7.0 4.0

What a nice and clean answer! Kudos! @panda-kim – Master Oogway Nov 29 '22 at 06:10 — Master Oogway, Nov 29 '22 at 06:10

Pandas Shift: Looking for better alternative

1 Answers1