0

im trying to find the average value in a list. but sometimes my list contains large numbers that could affect calculating the average number.

list = [215,255,210,205,450,315,235,250,450,250,250,250,210,450,210]

it would make sense that the average would range from 210 to 250, but number such as 450 and 315 could increase the average value. how do automatically remove dominant factors like the number 450 and easily find the correct average number?

Zenast
  • 61
  • 1
  • 8
  • 2
    There's nothing special about doing it in Python compared to doing it in any other language (except using some idioms like list comprehension to e.g. filter out those values). You would first filter out (reject) outliers and then calculate average. Depending on your data set size and how you want to mark something as an outlier you might find data science toolkits for Python like pandas, numpy or scipy useful. See this answer for further info: https://stackoverflow.com/questions/11686720/is-there-a-numpy-builtin-to-reject-outliers-from-a-list – blami Apr 21 '20 at 05:33
  • 3
    the definition of "average" means `sum(list) / len(list)` and that's the way to calculate it. If you want to exclude values from that calculations based on them being "large" (or "small" for that matter) and you have decided beforehand what that means you can do `new_list = [i for i in list if i < large and small < i]` `sum(new_list) / len(new_list)` however if you want a more objective criterion for what "small" or "large" means you can use scipy.stat to fit a normal distribution and exclude "unlikely" values (values with point-percentile function less than a given value). – Chris Apr 21 '20 at 05:35

1 Answers1

0

The dominant factors you are talking about are called 'outliers' in data that are abnormal values(very high or very low) when compared with the rest of the dataset. You can use the concept of zscore to remove these outliers from your data

  from scipy.stats import zscore
  list1 = [215,255,210,205,450,315,235,250,450,250,250,250,210,450,210]
  score=zscore(list1)
  threshold=1 #should be 3 generally
  list1 = [value for index,value in enumerate(list1) if abs(score[index])<=threshold ]

You can change the threshold according to your wish & see the list1 you are getting finally to decide for the threshold(do try for a number of values ranging from 0-3).For more on zscore: outlier detection

Mehul Gupta
  • 1,829
  • 3
  • 17
  • 33