Access last X rows in Pandas apply

Question

I have a Pandas dataframe with one column of numbers, similar to this:

id - val
0  - 100
1  - 200
2  - 100
3  - 400
4  - 300
5  - 100
etc

What I would like to do is to add a second column which is a list/numpy array of the values from the previous 3 rows:

id - val - val_list
0  - 100 - [] # Or [NaN, NaN, NaN]
1  - 200 - [100] # Or [NaN, NaN, 100]
2  - 100 - [100, 200] # Or [NaN, 100, 200]
3  - 400 - [100, 200, 100]
4  - 300 - [200, 100, 400]
5  - 100 - [100, 400, 300]
etc

Any idea how to solve this efficiently, preferably without looping?

How many rows does this data frame usually have? – Bhanu Tez Oct 25 '18 at 05:34 — Bhanu Tez, Oct 25 '18 at 05:34
Order of NaN should be changed like `[nan, nan, 100.0]` ? – jezrael Oct 25 '18 at 05:40 — jezrael, Oct 25 '18 at 05:40
Yes you are right @Jezrael – Franc Weser Oct 25 '18 at 05:44 — Franc Weser, Oct 25 '18 at 05:44

jezrael · Accepted Answer · 2018-10-25T05:49:41.537

4

First I think working with lists in pandas is not good idea, if possible better is working with 2d numpy array here.

Use strides if performance is important:

N = 3
x = np.concatenate([[np.nan] * (N), df['val'].values])

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
arr = rolling_window(x, N)

df['val_list'] = arr[:-1].tolist()
print (df)
   id  val               val_list
0   0  100        [nan, nan, nan]
1   1  200      [nan, nan, 100.0]
2   2  100    [nan, 100.0, 200.0]
3   3  400  [100.0, 200.0, 100.0]
4   4  300  [200.0, 100.0, 400.0]
5   5  100  [100.0, 400.0, 300.0]

edited Oct 25 '18 at 05:49

answered Oct 25 '18 at 05:33

jezrael

822,522
95
1,334
1,252

Downvoter, if there's something wrong with my answer, please let me know, so I can correct it. Thanks. – jezrael Oct 25 '18 at 05:44
Hey, thanks for the answer. Wasn't me who downvoted, give me a bit to check your solution – Franc Weser Oct 25 '18 at 05:47
@FrancWeser - I know it, you have no reason. – jezrael Oct 25 '18 at 05:47
2

+1. Slightly more clear would be to have: `shape = (a.size - window + 1, window)`, `strides = a.strides*2`. – Andreas K. Oct 25 '18 at 06:06

Bhanu Tez · Answer 2 · 2018-10-25T06:07:55.593

3

Hi use the following simple code.

df = pd.DataFrame([100,200,100,400,300,100],columns =['Val'])

temp = pd.concat([df.shift(3),df.shift(2),df.shift(1)],axis=1)
df['val_list'] = temp.apply(lambda x:x.tolist(),axis=1)

#

df = pd.DataFrame([100,200,100,400,300,100],columns =['Val'])
N=3
temp = pd.DataFrame()
for i in range(N,0,-1):
    temp = pd.concat([temp,df.shift(i)],axis=1)
df['val_list'] = temp.apply(lambda x:x.tolist(),axis=1)

edited Oct 25 '18 at 06:07

answered Oct 25 '18 at 05:42

Bhanu Tez

296
2
14

Hi, please let me know while downvoting, so I can correct the code. – Bhanu Tez Oct 25 '18 at 05:50
Hi, thanks for the answer. Didn't downvote, but do you have an idea how to implement if the shift is not 3 but lets say 300 (so a list of the last 300 rows)? Don't want to write df.shift(300)...df.shift(1), thanks – Franc Weser Oct 25 '18 at 05:53
Oh okay, if you need the last 300 rows.. definitely my implementation is of no use.. Thanks for clearing.. looks like today is my bad day.. so many people downvoting my answers.. I am just wondering why :( – Bhanu Tez Oct 25 '18 at 05:58
Appreciate still, wish you the best! – Franc Weser Oct 25 '18 at 06:04
2

Hi, Franc.. wrote one more logic.. but with for loop(but you said you want to avoid for loop).. have a look.. thanks for upvote.. – Bhanu Tez Oct 25 '18 at 06:09
1

Yep I'm trying to avoid loops where possible due to performance.. thanks!! – Franc Weser Oct 25 '18 at 06:26

Access last X rows in Pandas apply

2 Answers2