I'm trying to do interpolation on a pandas DataFrame that contains time series data. I have hourly data for temp
, and I want to interpolate the temp
values at the half-hourly points. This way, I estimate temp
for every trading period for each day, ie. 24h per day so 48 trading periods per day.
My MWE is
import numpy as np
import pandas as pd
from datetime import datetime, date, timedelta
import pyarrow as pa
import pyarrow.parquet as pq
# my dataset
df = pd.DataFrame()
d1 = '2020-10-21'
d2 = '2020-10-22'
df['date'] = pd.to_datetime([d1]*24+[d2]*24, format='%Y-%m-%d')
df['time'] = pd.date_range(d1, periods=len(df), freq='H').time
df['temp'] = pd.DataFrame((50+20*np.sin(np.linspace(0,0.91*np.pi,len(df))))).values
# combine time and date
df.loc[:,'datetime'] = pd.to_datetime(df.date.astype(str)+' '+df.time.astype(str))
df = df.drop(['date','time'], axis=1)
df = df.set_index('datetime')
# trading period
df['tp'] = pd.DataFrame(df.index.hour.values*2+1).values
# interpolate to find temp and datetime for trading periods 2,4,6,...
for n in df.tp.values:
df.loc[-1,'tp'] = n+1
df = df.sort_values('tp').reset_index(drop=True)
#df = df.interpolate(method='linear')
print(df.head(10))
I'm adapting the answer in this post, but I get the error TypeError: value should be a 'Timestamp' or 'NaT'. Got 'int' instead.
I suspect it's due to the df.loc[-1,'tp'] = n+1
line but not sure how to fix it.