I am trying to get the outliers of a column (with IQR), once I get the outliers I want to set the values where the outliers are in my main dataframe to null in order to impute them afterwards. This is the way I implemeted it:
df_outliers_detected = detect_outliers_IQR(df['Outliers'])
df_outliers_detected = pd.DataFrame(df_outliers_detected)
print(df_outliers_detected)
for i in range(len(df)):
for j in range(len df_outliers_detected)):
if(df.loc[i, "Outliers"] == df_outliers_detected.iloc[j,0]):
df.loc[i,'Outliers'] = None
print(df['Outliers'].head(100))
This 2 for loops makes the program really slow, is their a better way to implement this?
The function code of "remove_outliers_IQR":
def detect_outliers_IQR(df):
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
IQR = Q3 - Q1
print(df)
print("\n")
df_outlier = df[((df<(Q1-1.5*IQR)) | (df>(Q3+1.5*IQR)))]
print(len(df_outlier))
return df_outlier