I am having the below data frame which is a time-series data and I process this information to input to my prediction models.
df = pd.DataFrame({"timestamp": [pd.Timestamp('2019-01-01 01:00:00', tz=None),
pd.Timestamp('2019-01-01 01:00:00', tz=None),
pd.Timestamp('2019-01-01 01:00:00', tz=None),
pd.Timestamp('2019-01-01 02:00:00', tz=None),
pd.Timestamp('2019-01-01 02:00:00', tz=None),
pd.Timestamp('2019-01-01 02:00:00', tz=None),
pd.Timestamp('2019-01-01 03:00:00', tz=None),
pd.Timestamp('2019-01-01 03:00:00', tz=None),
pd.Timestamp('2019-01-01 03:00:00', tz=None)],
"value":[5.4,5.1,100.8,20.12,21.5,80.08,150.09,160.12,20.06]
})
From this, I take the mean of the value for each timestamp and will send the value as the input to the predictor. But currently, I am using just thresholds to filter out the outliers,but those seem to filter out real vales and also not filter some outliers .
For example, I kept
df[(df['value']>3 )& (df['value']<120 )]
and then this does not filter out
2019-01-01 01:00:00 100.8
which is an outlier for that timestamp and does filter out
2019-01-01 03:00:00 150.09
2019-01-01 03:00:00 160.12
which are not outliers for that timestamp.
So how do I filter out outliers for each timestamp based on which does not fit that group?
Any help is appreciated.