0

I created a function to drop my outliers. Here is the function

def dropping_outliers(train, condition):
    drop_index = train[condition].index
    #print(drop_index)
    train = train.drop(drop_index,axis = 0)

and when I do

dropping_outliers(train, ((train.SalePrice<100000)  & (train.LotFrontage>150)))

Nothing is being dropped.However when I manually execute the function. i.e get the index in the dataframe for this condition, I do get a valid index (943) and when I do

train = train.drop([943],axis = 0)

Then the row I want is being dropped correctly. I don't understand why the function wouldn't work as its supposed to be doing exactly what I am doing manually.

user3234112
  • 103
  • 8

1 Answers1

1

At the end of dropping_outliers, it's assigning the result of drop to a local variable, not altering the dataframe passed in. Try this instead:

def dropping_outliers(train, condition):
    drop_index = train[condition].index
    #print(drop_index)
    return train.drop(drop_index,axis = 0)

Then do the assignment when you call the function.

train = dropping_outliers(train, ((train.SalePrice<100000)  & (train.LotFrontage>150)))

Also see python pandas dataframe, is it pass-by-value or pass-by-reference.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880