1

Hello everyone,

I am trying to remove the outliers from my dataset. I defined the outlier boundaries using the mean-3*std and mean+3*std. Now I want to delete the values smaller than mean-3*std and delete the values bigger than mean+3*std. Could you help me writing a formula for this? I am a beginner in python. I already looked at similar questions, but this did not helped so far.

Untill now I had the following:

import pandas as pd

print(df_OmanAirTO.mean()-3*df_OmanAirTO.std(), df_OmanAirTO.mean()+3*df_OmanAirTO.std())

resulting in:

FuelFlow                2490.145718
ThrustDerateSmoothed       8.522145
CoreSpeed                 93.945180
EGTHotDayMargin            9.950557
EGT                      684.168701
TotalAirTemperature       11.980698
ThrustDerate              -3.780215

dtype: float64 

FuelFlow                4761.600157
ThrustDerateSmoothed      29.439075
CoreSpeed                101.360974
EGTHotDayMargin           90.414781
EGT                      915.952163
TotalAirTemperature       43.266653
ThrustDerate              44.672861

dtype: float64

Now I want to delete the values smaller than mean-3*std and delete the values bigger than mean+3*std. How can I do this?

Thank you in advance for helping me!

1 Answers1

0

I assume you want to apply the outlier conditionals on each column (i.e. in column FuelFlow, remove cells smaller than 2490.145718 and larger than 4761.600157, and in column ThrustDerateSmoothed, remove cells smaller than 8.522145 and larger than 29.439075, etc...)

I will try this:

filt_outliers_df_oman = df.apply(lambda x: x[(x < df_OmanAir[x.name].mean()-3*df_OmanAir[x.name].std()) & 
                                             (x > df_OmanAIr[x.name].mean()+3*df_OmanAir[x.name].std())], axis=0)
gandhi_nn
  • 161
  • 1
  • 4