0

Given a dataframe df with 3 columns (for example: 'Country', 'Car' and 'Price'), how to check for outliers that are 3 standard deviations from the mean, separately for each country and car. The below code works, but not efficient.

sd = pd.DataFrame()
for country in df['Country'].unique():
    for car in df['Car'].unique():
        chunk = df[(df['Country']==country) & (df['Car']==car)]
        chunk['outlier'] = (np.abs(chunk['Price']-chunk['Price'].mean())) > 3*chunk['Price'].std()
        sd = pd.concat([sd, chunk])
Balázs Fehér
  • 344
  • 5
  • 14
  • Possible duplicate of [Remove outliers (+/- 3 std) and replace with np.nan in Python/pandas](http://stackoverflow.com/questions/29740216/remove-outliers-3-std-and-replace-with-np-nan-in-python-pandas) If that doesn't quite do what you want, take a look at the links in that question. There are a number of variants of this question that have been answered here, hopefully one or more of those answers your question – JohnE Mar 09 '17 at 18:17
  • If you look at "zscore" here in the documentation: http://pandas.pydata.org/pandas-docs/stable/groupby.html#transformation that should mostly answer your question – JohnE Mar 09 '17 at 18:19

0 Answers0