Given a dataframe df
with 3 columns (for example: 'Country'
, 'Car'
and 'Price'
), how to check for outliers that are 3 standard deviations from the mean, separately for each country and car. The below code works, but not efficient.
sd = pd.DataFrame()
for country in df['Country'].unique():
for car in df['Car'].unique():
chunk = df[(df['Country']==country) & (df['Car']==car)]
chunk['outlier'] = (np.abs(chunk['Price']-chunk['Price'].mean())) > 3*chunk['Price'].std()
sd = pd.concat([sd, chunk])