0

I am new to pandas so can someone please give me an insight into the following:

I would like to drop some outliers and the following code does not work:

train = train.drop(['GrLivArea']>4000) & (['SalePrice']<300000)

when i try: train = train.drop(train[(train['GrLivArea']>4000) & (train['SalePrice']<300000)].index), it works.

Can someone please explain why I need to state 'train' two times before the column name (GrLivArea) when I indicated the dataframe I am referring to (by stating train.drop)

thanks alot.

Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
Nora
  • 1
  • 2
  • 1
    the drop method requires columns to be mentioned, the first one has no reference to the dataframe - the function has no idea what to do with it, and as such it fails. the second one however worked because the rows that should be dropped are returned, and since drop can work on both axis and columns, the drop method works. – sammywemmy Jan 28 '20 at 13:07
  • please share sample df – Bhosale Shrikant Jan 28 '20 at 13:09

1 Answers1

1

I would use:

train = train[(train['GrLivArea']>4000) & (train['SalePrice']<300000)]
Carsten
  • 2,765
  • 1
  • 13
  • 28