I would like to be able to determine outliers from a list where data don't follow a normal distribution.
list = [0, 1, 2, 3, 2, 1, ..., 2, 50, 100, 101, 102, 103, 101, ... 100, 150]
outlier = [50, 150]
any ideas?
I would like to be able to determine outliers from a list where data don't follow a normal distribution.
list = [0, 1, 2, 3, 2, 1, ..., 2, 50, 100, 101, 102, 103, 101, ... 100, 150]
outlier = [50, 150]
any ideas?
One idea is to fit the probability density function (pdf) to your data (see this link). And then you may take a look at how likely certain values are given your estimated function. Any data points that deviate too much from your pdf may be considered outliers.
EDIT Also given your data, it looks like you could fit your data with two normal distributions as explained in this article with µ1 = 2
, µ2 = 102
and σ1 = σ2 = 3
roughly. I'd suggest however to estimate these values empirically (or given that you know each Gaussian, just take the values from there). Finally, you might check if your pdf is indeed multimodal iff d > 1
.
Does that help you? In case it does not, please let us/me know why!
Cheers