I need to identify outliers values in my dataframe, in my case values higher than 4*Z-score. My dataframe has many columns with sort by date(2012-01-01 1:30:00).
The values follow a time pattern and the temperature data, so we need to evaluate whether a given data is a discrepant value compared to the other data at the same time. For example, if I compare an afternoon record with values from other periods, this can be considered erroneously discrepant.
I tried something for just one column but to no avail.
hours = ['00:00','01:00','02:00','03:00','04:00','05:00','06:00','07:00','08:00','09:00','10:00','11:00','12:00','13:00','14:00','15:00','16:00','17:00','18:00','19:00','20:00','21:00','22:00','23:00','23:59']
df = pd.read_excel(file)
df.set_index('Date',inplace=True)
for i in range(24):
df.loc[df[np.abs((df['column1'].between_time(hours[i],hours[i+1]) - df['column1'].between_time(hours[i],hours[i+1]).mean())/df['column1'].between_time(hours[i],hours[i+1]).std()) > 4], 'column1']='outlier'