-1

I try to remove outliers in a python list. But it removes only the first one (190000) and not the second (20000). What is the problem ?

import statistics
dataset = [25000, 30000, 52000, 28000, 150000, 190000, 200000]

def detect_outlier(data_1):
    threshold = 1
    mean_1 = statistics.mean(data_1)
    std_1 = statistics.stdev(data_1)
    #print(std_1)
    for y in data_1:
        z_score = (y - mean_1)/std_1
        print(z_score)
        if abs(z_score) > threshold:
            dataset.remove(y)
    return dataset  
dataset = detect_outlier(dataset)
print(dataset)

Output:

[25000, 30000, 52000, 28000, 150000, 200000]
lf_celine
  • 653
  • 7
  • 19

2 Answers2

2

It is because you are trying to make operations on the same data address. dataset's address is equals to the data_1 address and when you are removing an element from the list, it pass the next element according to the foreach structure of Python. You must not make operations on a list during iteration.

Shortly, try to call the method like this(this sends dataset's elements as a new list, doesn't send the dataset):

dataset = detect_outlier(dataset[:])
Emrah Tema
  • 335
  • 3
  • 9
1
import statistics

def detect_outlier(data_1):
    threshold = 1
    mean_1 = statistics.mean(data_1)
    std_1 = statistics.stdev(data_1)
    result_dataset = [y  for y in data_1 if abs((y - mean_1)/std_1)<=threshold ]

    return result_dataset
if __name__=="__main__":
    dataset = [25000, 30000, 52000, 28000, 150000, 190000, 200000]
    result_dataset = detect_outlier(dataset)
    print(result_dataset)