Pandas: interpolation where first and last data point in column is NaN

Question

I would like to use the interpolate function, but only between known data values in a pandas DataFrame column. The issue is that the first and last values in the column are often NaN and sometimes it can be many rows before a value is not NaN:

      col 1    col 2
 0    NaN      NaN
 1    NaN      NaN
...
1000   1       NaN
1001  NaN       1   <-----
1002   3       NaN  <----- only want to fill in these 'in between value' rows
1003   4        3
...
3999  NaN      NaN
4000  NaN      NaN

I am tying together a dataset which is updated 'on event' but separately for each column, and is indexed via Timestamp. This means that there are often rows where no data is recorded for some columns, hence a lot of NaNs!

Could you post the code and data of `tying together a dataset`? — WoodChopper, Nov 13 '15 at 11:52
ummm the data is quite large but I'll add a snippit if I can... I tie together using a simple pd.concat which fills in NaNs where data isn't recorded for the same index in either one of the dataframes that are being concatenated: pdDataFrame_ToAppend = pd.concat([pdDataFrame_ToAppend,tempData], join='outer') — HStro, Nov 13 '15 at 12:18

score 1 · Accepted Answer · edited May 23 '17 at 10:33

1

I select by min and max value of column by function idxmin and idxmax and use function fillna with method forward filling.

print df
#      col 1  col 2
#0       NaN    NaN
#1       NaN    NaN
#1000      1    NaN
#1001    NaN      1
#1002      3    NaN
#1003      4      3
#3999    NaN    NaN
#4000    NaN    NaN

df.loc[df['col 1'].idxmin(): df['col 1'].idxmax()] = df.loc[df['col 1'].idxmin(): df['col 1'].idxmax()].fillna(method='ffill')
df.loc[df['col 2'].idxmin(): df['col 2'].idxmax()] = df.loc[df['col 2'].idxmin(): df['col 2'].idxmax()].fillna(method='ffill')
print df
#      col 1  col 2
#0       NaN    NaN
#1       NaN    NaN
#1000      1    NaN
#1001      1      1
#1002      3      1
#1003      4      3
#3999    NaN    NaN
#4000    NaN    NaN

Added different solution, thanks HStro.

df['col 1'].loc[df['col 1'].first_valid_index() : df['col 1'].last_valid_index()] = df['col 1'].loc[df['col 1'].first_valid_index(): df['col 1'].last_valid_index()].astype(float).interpolate()

edited May 23 '17 at 10:33

Community

1
1

answered Nov 13 '15 at 11:53

jezrael

822,522
95
1,334
1,252

that looks cool. Do you know if you can use interpolation in the same way? – HStro Nov 13 '15 at 12:16
You can try `apply(pandas.Series.interpolate)` instead of `fillna`. – jezrael Nov 13 '15 at 12:29
Ok I'm going to give that a go and let you know, cheers for the help! – HStro Nov 13 '15 at 12:56
Right so it worked, I had to change it a bit though. I used this: df_azure[col].loc[df_azure[col].first_valid_index() : df_azure[col].last_valid_index()] = df_azure[col].loc[df_azure[col].first_valid_index(): df_azure[col].last_valid_index()].astype(float).interpolate() – HStro Nov 16 '15 at 16:20

Pandas: interpolation where first and last data point in column is NaN

1 Answers1

Linked