17

I have an array wrong_indexes_train which contains a list of indexes that I would like to remove from a dataframe:

[0, 63, 151, 469, 1008]

To remove these indexes, I am trying this:

df_train.drop(wrong_indexes_train)

However, the code fails with the error:

ValueError: labels ['OverallQual' 'GrLivArea' 'GarageCars' 'TotalBsmtSF' 'FullBath'
 'YearBuilt'] not contained in axis

Here, ['OverallQual' 'GrLivArea' 'GarageCars' 'TotalBsmtSF' 'FullBath' 'YearBuilt'] are the names of my dataframe's columns.

How could I just make the dataframe drop the entire rows of the indices that I specified?

bsky
  • 19,326
  • 49
  • 155
  • 270
  • 1
    Have a look at this. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html "Specifying both labels and index or columns will raise a ValueError." – Samir Dec 21 '17 at 21:33

4 Answers4

21

Change it to

df_train.drop(wrong_indexes_train,axis=1)
Gabriel A
  • 1,779
  • 9
  • 12
  • 1
    Actually, it's `axis=1` because I'm removing rows, not columns, but I figured out looking at your answer. – bsky Dec 21 '17 at 21:44
  • 2
    That's what I had at first then I edited it after the documentation said otherwise. Glad I could help. axis : int or axis name Whether to drop labels from the index (0 / ‘index’) or columns (1 / ‘columns’). – Gabriel A Dec 21 '17 at 21:45
  • 4
    i thought drop(rownumber) will drop row = rownumber. axis =1 means column. What is this dropping? – Nguai al Jan 11 '19 at 04:17
  • 1
    The above seems a bit confusing; axis=1 would indeed drop columns, so axis=0 or axis='index' would be the way to drop rows. Correct approach, but need to set that parameter correctly based on the use case. – T. Shaffner Apr 27 '21 at 16:33
10

Not 100% certain what you want without a minimum-(not)working-example, but you should specify an axis parameter. df.drop returns the modified DataFrame. If you want to operate inplace, specify inplace=True.

See this for symbolic row names (index):

df = pd.DataFrame({"ones":[1,3,5],
                   "tens":[20, 40, 60]},
                  index=['barb', 'mark', 'ethan'])
df.drop(['barb', 'mark'], axis='index')

And this for numeric (default) indices:

df = pd.DataFrame({"ones":[1,3,5],
                   "tens":[20, 40, 60]})
df.drop([0,2], axis='index')
mirekphd
  • 4,799
  • 3
  • 38
  • 59
MrDrFenner
  • 1,090
  • 11
  • 19
3

Try

df_train=df_train.reset_index() 

followed by

df_train.drop(wrong_indexes_train)

My guess is df_train doesn't have a numerical index right now, rather one of the columns ['OverallQual' 'GrLivArea' 'GarageCars' 'TotalBsmtSF' 'FullBath' 'YearBuilt'] is serving as the index.

0

One can use drop DataFrame.drop for that.

Considering that one wants to drop the rows, one should use axis=0 or axis='index'. If one wants to drop columns, axis=1 or axis='columns'.

For your specific case, one can do

wrong_indexes_train = [0, 63, 151, 469, 1008]

df_train.drop(wrong_indexes_train, axis=0, inplace=True)

or

df_train.drop(df_train[[0, 63, 151, 469, 1008]], axis=0, inplace=True)

One can also select the rows with DataFrame.index

wrong_indexes_train = df_train.index[[0, 63, 151, 469, 1008]]

df_train.drop(wrong_indexes_train, inplace=True)

On another hand, and assuming that one's dataframe and the rows to drop are considerably big, one might want to consider selecting the rows to keep (as Dennis Golomazov suggests here). For that, one might use Mad Physicist's approach:

import numpy as np

wrong_indexes_train = [0, 63, 151, 469, 1008]

mask = np.ones((len(df_train), bool)

mask = df_train.iloc[wrong_indexes_train] = False

df_train_new = df_train.iloc[mask]
Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83