Pandas DataFrame: identify outlier and replace values in dataframe, based on conditions

Question

I need to identify outliers values in my dataframe, in my case values higher than 4*Z-score. My dataframe has many columns with sort by date(2012-01-01 1:30:00).

this is my data set

The values follow a time pattern and the temperature data, so we need to evaluate whether a given data is a discrepant value compared to the other data at the same time. For example, if I compare an afternoon record with values from other periods, this can be considered erroneously discrepant.

I tried something for just one column but to no avail.

hours = ['00:00','01:00','02:00','03:00','04:00','05:00','06:00','07:00','08:00','09:00','10:00','11:00','12:00','13:00','14:00','15:00','16:00','17:00','18:00','19:00','20:00','21:00','22:00','23:00','23:59']

df = pd.read_excel(file)
df.set_index('Date',inplace=True)

for i in range(24):

    df.loc[df[np.abs((df['column1'].between_time(hours[i],hours[i+1]) - df['column1'].between_time(hours[i],hours[i+1]).mean())/df['column1'].between_time(hours[i],hours[i+1]).std()) > 4], 'column1']='outlier'

Hi Mauricio, in the future if you want people to be more likely to help you do not post your data as an image. It makes it hard for us to move it to a space where we can work on it — d_kennetz, Oct 25 '18 at 20:43
Possible duplicate of [Detect and exclude outliers in Pandas dataframe](https://stackoverflow.com/questions/23199796/detect-and-exclude-outliers-in-pandas-dataframe) — jpp, Oct 25 '18 at 21:59

Pandas DataFrame: identify outlier and replace values in dataframe, based on conditions

0 Answers0