1

I have a list with values. From this list I would like to get the outliners.

list_of_values = [2, 3, 100, 5, 53, 5, 4, 7]

def detect_outlier(data):
  
    threshold= 3
    mean_1 = np.mean(data)
    std_1 =np.std(data)
     
    outliers = [y for y in data if (np.abs((y - mean_1)/std_1) > threshold)]
            
    return outliers

print(detect_outlier(list_of_values))

However, my print turns up empty, aka a [] without anything in it. Any ideas?

  • your `std_1` will be large because you have large values - typically you use the IQ Range (robust statistics) to do this. Because your test statistic is biased already. You can use that formula if you have converged (e.g more samples) so that that single value of 100 is not skewing things. – Chinny84 Jul 05 '22 at 14:48
  • 2
    Is this answering your question: https://stackoverflow.com/questions/11686720/is-there-a-numpy-builtin-to-reject-outliers-from-a-list ? – Dan Constantinescu Jul 05 '22 at 14:52
  • looking at it further - using your details above, you have a z score of 2.3 which is less than 3. – Chinny84 Jul 05 '22 at 14:55
  • Thanks! I have tried to make the function less biased by using median instead of mean. I now get values inside my array. However, I am not sure if it is okay to just change and use the median: np.abs((y - median_1/std_1) > threshold. Do you know? – Olovia.solfjell Jul 06 '22 at 07:31
  • If you are going to use robust statistics, you need to use the equivalent for std - such as IQ ranges etc. but if you will have plenty of values to create a distribution so you can take the z score then you will be fine, but for the example you have above it makes more sense to not use z score - in machine learning you can set outliers like [this](https://online.stat.psu.edu/stat200/lesson/3/3.2) – Chinny84 Jul 06 '22 at 08:43
  • Thanks, I changed to using iqr. – Olovia.solfjell Jul 06 '22 at 09:42

1 Answers1

0

Since std_1 = 33.413, any element in list_of_values divided by std_1 will be smaller than the threshold and hence not yielded.

Lucas Meier
  • 369
  • 3
  • 6