1

This code works. But I can't help but feel it's a hack, especially the "offset" part. I had to put that in there because otherwise all the index values in deletes are shifted by one every time I do a del operation.

    # remove outliers > devs # of std deviations
    devs = 1
    deletes = []
    for num, duration in enumerate(durations):
        if (duration > (mean_duration + (devs * std_dev_one_test))) or \
            (duration < (mean_duration - (devs * std_dev_one_test))):
            deletes.append(num)
    offset = 0
    for delete in deletes:
        del durations[delete - offset]
        del dates[delete - offset]
        offset += 1

Ideas on how to make it better?

Aaron
  • 2,154
  • 5
  • 29
  • 42
  • `(duration > (mean_duration + (devs * std_dev_one_test))) or (duration < (mean_duration - (devs * std_dev_one_test)))` simplifies to `abs(duration-mean_duration) > devs * std_dev_one_test`, without losing any readability. – PaulMcG Jul 07 '12 at 07:22

4 Answers4

4

Build a list of keepers as you iterate over the list:

def isKeeper( duration ):
    if (duration > (mean_duration + (devs * std_dev_one_test))) or \
            (duration < (mean_duration - (devs * std_dev_one_test))):
        return False
    return True

durations = [duration for duration in durations if isKeeper(duration)]
Russell Borogove
  • 18,516
  • 4
  • 43
  • 50
3

Maybe something like this:

import numpy as np        

myList = [1,2,3,4,5,6,7,3,4,5,3,5,99] 

mean_duration  = np.mean(myList)
std_dev_one_test = np.std(myList)     

def drop_outliers(x):
    if abs(x - mean_duration) <= std_dev_one_test:
        return x

myList = filter(drop_outliers, myList)

Result:

>>> myList
[1, 2, 3, 4, 5, 6, 7, 3, 4, 5, 3, 5]
Akavall
  • 82,592
  • 51
  • 207
  • 251
1

Is the problem that you are deleting items from a list and it causes the index to shift and you are compensating with an offset?

If that's the case, then just delete form the back to the front, that way as you delete items it won't affect the rest of the list.

So start iterating from the last item to the front of the list.

These SO question might be of interest Delete many elements of list (python) and Python: Removing list element while iterating over list

Another good SO discussion can be found here: Remove items from a list while iterating (thanks to @PaulMcGuire for the suggestion via the comments)

Community
  • 1
  • 1
Levon
  • 138,105
  • 33
  • 200
  • 191
  • Here is another good discussion of this topic: http://stackoverflow.com/questions/1207406/remove-items-from-a-list-while-iterating-in-python, especially Alex Martelli's additional comments. – PaulMcG Jul 07 '12 at 07:25
  • @PaulMcGuire Thanks .. that is a good link, I will add it to my answer if you don't mind in case someone skips the comments. – Levon Jul 07 '12 at 10:32
0

If your data set is small you can just reverse your logic, and keep values instead of deleting them:

# keep value outliers < devs # of std deviations
devs = 1
keeps = []
for duration in durations:
    if (duration <= (mean_duration + (devs * std_dev_one_test))) and \
        (duration >= (mean_duration - (devs * std_dev_one_test))):
        keeps.append(duration)
Michael Anderson
  • 70,661
  • 7
  • 134
  • 187