print outliers from scatterplot pandas

Question

after scatterplotting two columns from a dataframe, there is clearly an outlier given by the last row of the dataframe, I try to print it but this code always prints 'no outlier'. It seems pretty simple but somehow I can't understand why this code doesn't detect this outlier.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

data=[[ 10,10],
    [ 15,15],
    [ 14,14]
    ,[16,16],
    [19,19],
    [17,17]
    ,[6,6],
    [5,5],
    [20,20]
    ,[22,22],
    [21,21],
    [18,45 ]]
df = pd.DataFrame(data, columns=['x','y'])

plt.scatter(df['x'],df['y'])
plt.show()

if 17<df['x'].any()<19 and 42<df['y'].any()<48:
    print(df['x'], df['y'])
else:
    print('no outliers')

The problem is that `df['x'].any()` returns `True`. If you ask for `17 — mosc9575, Mar 10 '21 at 10:55

score 0 · Answer 1 · answered Mar 10 '21 at 10:53

0

Use Series.between with & for bitwise AND and filter in boolean indexing:

m = df['x'].between(17, 19, inclusive=False) & df['y'].between(42, 48, inclusive=False)

if m.any():
    df1 = df[m]
    print (df1)
         x   y
    11  18  45
else:
    print('no outliers')

answered Mar 10 '21 at 10:53

jezrael

822,522
95
1,334
1,252

thank you. I have one more question: would it be possible to include more parameters in the code below, in case I find more outliers that I want to print? Or I have to write the below code for each outlier? – d8a988 Mar 10 '21 at 14:24
@d8a988 - Not 100% sure if understand, but [here](https://stackoverflow.com/questions/23199796/detect-and-exclude-outliers-in-pandas-data-frame) are some another ways for detect outliers, I hope help it. – jezrael Mar 10 '21 at 14:26
what I mean is how would it be possible to add more outliers in the first line m=df['x'] etc... for instance, if I detect additional outliers in the scatterplot, I would like to be able to add something like "& {df['x'].between(1 ,3, inclusive=False) & df['y'].between(5, 6, inclusive=False)} in the first line of your code, but I am not sure which is the correct way. – d8a988 Mar 11 '21 at 19:31
@d8a988 - if need remov numbers between then yes, it should working well. – jezrael Mar 12 '21 at 05:01
I am unfortunately having another issue with outliers. Would you perhaps know what the issue is? Here is the link: https://stackoverflow.com/questions/66598118/python-outliers-remain-in-the-scatterplot-even-after-removal – d8a988 Mar 15 '21 at 09:39

print outliers from scatterplot pandas

1 Answers1