I have the following time series dataframe. I would like to fill the missing values with the previous value. However i would only want to fill the missing values between the first_valid_index and the last_valid index. So the columns i wanna fill will be different for each row. How can i do this?
So, given this dataframe.
import numpy as np
import pandas as pd
df = pd.DataFrame([[1, 2 ,3,np.nan,5], [1, 3 , np.nan , 4 , np.nan], [4, np.nan , 7 , np.nan,np.nan]], columns=[2007,2008,2009,2010,2011])
Input dataframe:
2007 2008 2009 2010 2011
1 2 3 NaN 5
1 3 NaN 4 NaN
4 Nan 7 NaN NaN
Output dataframe:
2007 2008 2009 2010 2011
1 2 3 3 5
1 3 3 4 NaN
4 4 7 NaN NaN
I thought of creating new columns for first_valid_index and last_valid_index and then using .apply() but how can i fill different columns per row?
def fillMissing(x):
first_valid = int(x["first_valid"])
last_valid = int(x["last_valid"])
for i in range(first_valid,last_valid + 1):
missing.append(i)
#What should i do here since the following is not valid
#x[missing] = x[missing].fillna(method='ffill', axis=1)
df.apply(fillMissing , axis=1)