If I have a df
similar to this one:
print(df)
A B C D E
DATE_TIME
2016-08-10 13:57:00 3.6 A 1 NaN NaN
2016-08-10 13:58:00 4.7 A 1 4.5 NaN
2016-08-10 13:59:00 3.4 A 0 NaN 5.7
2016-08-10 14:00:00 3.5 A 0 NaN NaN
2016-08-10 14:01:00 2.6 A 0 4.6 NaN
2016-08-10 14:02:00 4.8 A 0 NaN 4.3
2016-08-10 14:03:00 5.7 A 1 NaN NaN
2016-08-10 14:04:00 5.5 A 1 5.7 NaN
2016-08-10 14:05:00 5.6 A 1 NaN NaN
2016-08-10 14:06:00 7.8 A 1 NaN 5.2
2016-08-10 14:07:00 8.9 A 0 NaN NaN
2016-08-10 14:08:00 3.6 A 0 NaN NaN
print (df.dtypes)
A float64
B object
C int64
D float64
E float64
dtype: object
Thanks to a lot of input from the community I have this code now which allows me to upsample my df to second intervals, applying different methods to different dtypes
int_cols = df.select_dtypes(['int64']).columns
index = pd.date_range(df.index[0], df.index[-1], freq="s")
df2 = df.reindex(index)
for col in df2:
if col == int_cols.all():
df2[col].ffill(inplace=True)
df2[col] = df2[col].astype(int)
elif df2[col].dtype == float:
df2[col].interpolate(inplace=True)
else:
df2[col].ffill(inplace=True)
I am looking for a way now, to only interpolate between my actual measurements. The interpolate function extends my last measurement until the end of the df
:
df2.tail()
Out[75]:
A B C D E
2016-08-10 14:07:56 3.953333 A 0 5.7 5.2
2016-08-10 14:07:57 3.865000 A 0 5.7 5.2
2016-08-10 14:07:58 3.776667 A 0 5.7 5.2
2016-08-10 14:07:59 3.688333 A 0 5.7 5.2
2016-08-10 14:08:00 3.600000 A 0 5.7 5.2
But I would like to stop this when the last measurement took place (for example at 14:04:00 col['D']
and 14:06:00 col['D']
) and leave the NaNs.
It tried adding a zero value for 'limit' and 'limit_direction' to 'both':
for col in df2:
if col == int_cols.all():
df2[col].ffill(inplace=True)
df2[col] = df2[col].astype(int)
elif df2[col].dtype == float:
df2[col].interpolate(inplace=True,limit=0, limit_direction='both')
else:
df2[col].ffill(inplace=True)
but this didn't change anything to the output. I than tried to incorporate the solution I found to this question: Pandas: interpolation where first and last data point in column is NaN into my code:
for col in df2:
if col == int_cols.all():
df2[col].ffill(inplace=True)
df2[col] = df2[col].astype(int)
elif df2[col].dtype == float:
df2[col].loc[df2[col].first_valid_index(): df2[col].last_valid_index()]=df2[col].loc[df2[col].first_valid_index(): df2[col].last_valid_index()].astype(float).interpolate(inplace=True)
else:
df2[col].ffill(inplace=True)
...but that did not work and my float64
columns are purely NaNs now...Also, the way I tried to insert the code, I know it would only have affected the float
columns. In an ideal solution I would hope to do the set this first_valid_index():.last_valid_index()
selection also to the object
and int64
columns. Can somebody help me? ..thank you