I am trying to do an outlier treatment on my time series data where I want to replace the values > 95th percentile with the 95th percentile and the values < 5th percentile with the 5th percentile value. I have prepared some code but I am unable to find the desired result.
I am trying to create a OutlierTreatment function using a sub- function called Cut. The code is given below
def outliertreatment(df,high_limit,low_limit):
df_temp=df['y'].apply(cut,high_limit,low_limit, extra_kw=1)
return df_temp
def cut(column,high_limit,low_limit):
conds = [column > np.percentile(column, high_limit),
column < np.percentile(column, low_limit)]
choices = [np.percentile(column, high_limit),
np.percentile(column, low_limit)]
return np.select(conds,choices,column)
I expect to send the dataframe, 95 as high_limit and 5 as low_limit in the OutlierTreatment function. How to achieve the desired result?