I have a large dataset with mostly numeric columns and some object type (string) columns. I am trying to remove the outliers using quantiles for the numeric columns, but I am having trouble skipping over the string columns. I want to iterate over each column, check if it is an object type and if not, calculate the IQR for that column, find the outliers, remove the entire row that outlier is in and move on to the next column. I tried different approaches but the one below makes the most sense out of the ones I came up with. The issue is that it filters more than 90% of the rows which I know isn't correct because I made another dataset with just the numeric columns, filtered those, and got a reasonable amount removed (<10%). I just don't know how to implement this and would appreciate any help.
def filter_outliers(df):
numeric_columns = df.select_dtypes(include=[np.number]).columns
df_filtered = df.copy()
for column in numeric_columns:
q1 = df[column].quantile(0.25)
q3 = df[column].quantile(0.75)
iqr = q3 - q1
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr
outliers = df[(df[column] < lower_bound) | (df[column] > upper_bound)]
df_filtered = df_filtered.drop(outliers.index, errors='ignore')
return df_filtered