python time series lag by shift(1), how to fillna for the created NaN

Question

I have a a very large dataset containing id and data points of time series (with some missing values). The following is just an example.

I will need to create a lag variable for both group which of course will create NaN for the first observation for each group. I would like to assign the next available value to the created NaN specifically but leave other missing value untouched for later manipulation.

id    time    value    lag_value 
A     2000    10       NaN      # I want this to be 10, the next available value 
A     2001    11       10 
A     2002    NaN      11 
A     2003    14       NaN 
A     2004    10       14

Edit:

I think it would be cleaner to use first_valid_index to assign the next available value, see Pandas - find first non-null value in column

@StephenRauch not able to do manually for a really large dataset... — Jin, Apr 08 '18 at 23:15
Yes, your title says you are using `.shift()`. But since you did not show any code I am guessing here, but you did have to assign the `lag_value` by hand, as you put it. Why not one more line of code to set element 0 equal to element 1? — Stephen Rauch, Apr 08 '18 at 23:18

score 1 · Answer 1 · answered Apr 08 '18 at 23:19

Here you go, this will fill the first value with the first non NaN entry from the original list.

import pandas as pd
import numpy as np
df = pd.DataFrame({'id': ['A', 'A', 'A', 'A', 'A'],
                  'time': [2000, 2001, 2002, 2003, 2004],
                  'value': [10, 11, np.NaN, 14, 10]})

df['lag_value'] = df.value.shift(1)
df.loc[0, 'lag_value'] = df.lag_value[df.lag_value.notnull()].values[0]

#  id  time  value  lag_value
#0  A  2000   10.0       10.0
#1  A  2001   11.0       10.0
#2  A  2002    NaN       11.0
#3  A  2003   14.0        NaN
#4  A  2004   10.0       14.0

Thanks! I will try to amend your answer as function to apply for groupby so that I can set created NaN for all groups. — Jin, Apr 08 '18 at 23:28

score 1 · Answer 2 · answered Apr 09 '18 at 00:12

1

Since you mention first_valid_index

s=df.value.shift()
s.iloc[s.first_valid_index()-1]=df.value.iloc[0]
s
Out[110]: 
0    10.0
1    10.0
2    11.0
3     NaN
4    14.0
Name: value, dtype: float64

answered Apr 09 '18 at 00:12

BENY

317,841
20
164
234

python time series lag by shift(1), how to fillna for the created NaN

2 Answers2