0

I am trying out to find out outliers in a dataset which I have created to understand the topic by myself. Its a simple python list. But I am not able to get the desired outcome. I am using google collab. I am using the concept that in a normal distribution, after the 3rd standard deviation mostly the outliers exists.

The code is given below:

df2=[12,13,14,15,10,12,14,15,1007,12,14,17,18,1005,14,15,16,17,13,14,1100,12,13,14,15]
outliers=[]

 def detect_outliers(data):
 threshold = 3             ## threshold is till 3rd standard deviation 
 mean = np.mean(data)
 standard_deviation = np.std(data)

    for i in data:
       z_score = (i-mean)/standard_deviation
       if np.abs(z_score)>threshold:
            outliers.append(i)

 return outliers
detect_outliers(df2)

I am getting the output in the form of an empty list. []

  • None of the (i-mean)/standard_deviation is above 3 so it makes sense that you get an empty outliers list. If you remove 1 or 2 large values (above 1000) then you will get isolated large values that could be identified as outlier values by your method – Yacine Hajji Jan 18 '23 at 15:38
  • Other example, you could increase the number of values in the pool of values around 10 and keep only 2 or 3 extremely large values so that the mean is more focused towards the pool rather than towards the extreme values. With this set, you will detect extreme values as outliers. Last, I would plot `df2` against `abs(i-mean)/standard_deviation` so that you can graphically understand what is happening – Yacine Hajji Jan 18 '23 at 15:43
  • It worked. Actually I just randomly took those numbers and I thought that since 3 big numbers are present as compared to the others in the list, so it may gonna display all the three numbers. – Partha Pratim Sarma Jan 18 '23 at 15:43
  • Also be careful with what you do with outliers. Outliers aren't necessarily absurd values, they can just reflect a specific distribution (e.g. log-normal). You usually need a rationale to call an outlier 'an absurd value' (e.g. device error, fraud, under-training) – Yacine Hajji Jan 18 '23 at 15:46
  • Hey can you help me with another problem related to anaconda – Partha Pratim Sarma Jan 18 '23 at 15:48
  • I can, but you must create a new topic and add the tag `Python`. Here your topic could have gained visibility with this tag. – Yacine Hajji Jan 18 '23 at 16:19
  • I have actually posted it my question here. I am giving u the link https://stackoverflow.com/questions/74946960/anaconda-prompt-is-having-an-issue-with-ssl-certificates – Partha Pratim Sarma Jan 18 '23 at 16:22
  • Unfortunately, I don't have the knowledge for this question – Yacine Hajji Jan 18 '23 at 16:28

0 Answers0