I'm trying to pull the indices from each column where a value has been flagged as an outlier. What I want is to then combine all those indices and remove them from my dataframe. I have a starting point here. I'm not sure if I should have the function include the full dataset and have each column outliers detected within or include instead in a for loop and append the bad indexes to a list.
def find_outliers(df):
q1 = df[i].quantile(.25)
q3 = df[i].quantile(.75)
IQR = q3 - q1
ll = q1 - (1.5*IQR)
ul = q3 + (1.5*IQR)
upper_outliers = df[df[i] > ul].index.tolist()
lower_outliers = df[df[i] < ll].index.tolist()
bad_indices = list(set(upper_outliers + lower_outliers))
return(bad_indices)
bad_indexes = []
for col in df.columns:
if df[col].dtype in ["int64","float64"]:
bad_indexes.append(find_outliers(df[col]))