I am trying to clean spikes in data in time series data in Pandas dataframe.
value = 5000
for index, row in gauteng_df.iterrows():
if index == gauteng_df.shape[0]-1:
break
upper, lower = row['Admissions to Date'] + value, row['Admissions to Date'] - value
a = gauteng_df.iloc[index+1]['Admissions to Date']
if a > upper or a < lower:
a = (gauteng_df.iloc[index-1]['Admissions to Date'] + gauteng_df.iloc[index+1]['Admissions to Date'])/2
gauteng_df.iloc[index]['Admissions to Date'] = a
I tried to reference the subsequent data point. If the current data point falls outside of the interval of the subsequent data point (i.e point +- value), the current data point will be replaced by the average of the previous data point and the next data point. Unfortunately, when I tried to plot the new graph, there are no changes reflected, and the spikes are still there.
I would appreciate any help in this! Also, df.iterrows()
might not be the most efficient method so I would appreciate any help on a better method to replace the spikes values.