3

I would like to use the interpolate function, but only between known data values in a pandas DataFrame column. The issue is that the first and last values in the column are often NaN and sometimes it can be many rows before a value is not NaN:

      col 1    col 2
 0    NaN      NaN
 1    NaN      NaN
...
1000   1       NaN
1001  NaN       1   <-----
1002   3       NaN  <----- only want to fill in these 'in between value' rows
1003   4        3
...
3999  NaN      NaN
4000  NaN      NaN

I am tying together a dataset which is updated 'on event' but separately for each column, and is indexed via Timestamp. This means that there are often rows where no data is recorded for some columns, hence a lot of NaNs!

HStro
  • 145
  • 1
  • 11
  • Could you post the code and data of `tying together a dataset`? – WoodChopper Nov 13 '15 at 11:52
  • ummm the data is quite large but I'll add a snippit if I can... I tie together using a simple pd.concat which fills in NaNs where data isn't recorded for the same index in either one of the dataframes that are being concatenated: pdDataFrame_ToAppend = pd.concat([pdDataFrame_ToAppend,tempData], join='outer') – HStro Nov 13 '15 at 12:18

1 Answers1

1

I select by min and max value of column by function idxmin and idxmax and use function fillna with method forward filling.

print df
#      col 1  col 2
#0       NaN    NaN
#1       NaN    NaN
#1000      1    NaN
#1001    NaN      1
#1002      3    NaN
#1003      4      3
#3999    NaN    NaN
#4000    NaN    NaN

df.loc[df['col 1'].idxmin(): df['col 1'].idxmax()] = df.loc[df['col 1'].idxmin(): df['col 1'].idxmax()].fillna(method='ffill')
df.loc[df['col 2'].idxmin(): df['col 2'].idxmax()] = df.loc[df['col 2'].idxmin(): df['col 2'].idxmax()].fillna(method='ffill')
print df
#      col 1  col 2
#0       NaN    NaN
#1       NaN    NaN
#1000      1    NaN
#1001      1      1
#1002      3      1
#1003      4      3
#3999    NaN    NaN
#4000    NaN    NaN

Added different solution, thanks HStro.

df['col 1'].loc[df['col 1'].first_valid_index() : df['col 1'].last_valid_index()] = df['col 1'].loc[df['col 1'].first_valid_index(): df['col 1'].last_valid_index()].astype(float).interpolate()
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • that looks cool. Do you know if you can use interpolation in the same way? – HStro Nov 13 '15 at 12:16
  • You can try `apply(pandas.Series.interpolate)` instead of `fillna`. – jezrael Nov 13 '15 at 12:29
  • Ok I'm going to give that a go and let you know, cheers for the help! – HStro Nov 13 '15 at 12:56
  • Right so it worked, I had to change it a bit though. I used this: df_azure[col].loc[df_azure[col].first_valid_index() : df_azure[col].last_valid_index()] = df_azure[col].loc[df_azure[col].first_valid_index(): df_azure[col].last_valid_index()].astype(float).interpolate() – HStro Nov 16 '15 at 16:20