Determine outlier from non normal distribution vector

Question

I would like to be able to determine outliers from a list where data don't follow a normal distribution.

list = [0, 1, 2, 3, 2, 1, ..., 2, 50, 100, 101, 102, 103, 101, ... 100, 150]

outlier = [50, 150]

any ideas?

tafaust · Answer 1 · 2018-08-20T12:33:23.843

0

One idea is to fit the probability density function (pdf) to your data (see this link). And then you may take a look at how likely certain values are given your estimated function. Any data points that deviate too much from your pdf may be considered outliers.

EDIT Also given your data, it looks like you could fit your data with two normal distributions as explained in this article with µ1 = 2, µ2 = 102 and σ1 = σ2 = 3 roughly. I'd suggest however to estimate these values empirically (or given that you know each Gaussian, just take the values from there). Finally, you might check if your pdf is indeed multimodal iff d > 1.

Does that help you? In case it does not, please let us/me know why!

Cheers

edited Aug 20 '18 at 12:33

answered Aug 20 '18 at 12:23

tafaust

1,457
16
32

thanks for the accurate and quick answer! I think indeed have the elements of answer to solve my problem;) – TheHadrien Aug 20 '18 at 13:34
Hey @TheHadrien ! Glad my answer was helpful to you. Could you please still select it as helpful? Thank you! – tafaust Aug 20 '18 at 14:57

Determine outlier from non normal distribution vector

1 Answers1