Drop rows by index from dataframe

Question

I have an array wrong_indexes_train which contains a list of indexes that I would like to remove from a dataframe:

[0, 63, 151, 469, 1008]

To remove these indexes, I am trying this:

df_train.drop(wrong_indexes_train)

However, the code fails with the error:

ValueError: labels ['OverallQual' 'GrLivArea' 'GarageCars' 'TotalBsmtSF' 'FullBath'
 'YearBuilt'] not contained in axis

Here, ['OverallQual' 'GrLivArea' 'GarageCars' 'TotalBsmtSF' 'FullBath' 'YearBuilt'] are the names of my dataframe's columns.

How could I just make the dataframe drop the entire rows of the indices that I specified?

Have a look at this. https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html "Specifying both labels and index or columns will raise a ValueError." — Samir, Dec 21 '17 at 21:33

Gabriel A · Accepted Answer · 2017-12-21T21:44:47.080

21

Change it to

df_train.drop(wrong_indexes_train,axis=1)

edited Dec 21 '17 at 21:44

answered Dec 21 '17 at 21:32

Gabriel A

1,779
9
12

1

Actually, it's `axis=1` because I'm removing rows, not columns, but I figured out looking at your answer. – bsky Dec 21 '17 at 21:44
2

That's what I had at first then I edited it after the documentation said otherwise. Glad I could help. axis : int or axis name Whether to drop labels from the index (0 / ‘index’) or columns (1 / ‘columns’). – Gabriel A Dec 21 '17 at 21:45
4

i thought drop(rownumber) will drop row = rownumber. axis =1 means column. What is this dropping? – Nguai al Jan 11 '19 at 04:17
1

The above seems a bit confusing; axis=1 would indeed drop columns, so axis=0 or axis='index' would be the way to drop rows. Correct approach, but need to set that parameter correctly based on the use case. – T. Shaffner Apr 27 '21 at 16:33

score 10 · Answer 2 · edited Apr 27 '20 at 18:14

Not 100% certain what you want without a minimum-(not)working-example, but you should specify an axis parameter. df.drop returns the modified DataFrame. If you want to operate inplace, specify inplace=True.

See this for symbolic row names (index):

df = pd.DataFrame({"ones":[1,3,5],
                   "tens":[20, 40, 60]},
                  index=['barb', 'mark', 'ethan'])
df.drop(['barb', 'mark'], axis='index')

And this for numeric (default) indices:

df = pd.DataFrame({"ones":[1,3,5],
                   "tens":[20, 40, 60]})
df.drop([0,2], axis='index')

score 3 · Answer 3 · edited Sep 19 '19 at 15:50

3

Try

df_train=df_train.reset_index()

followed by

df_train.drop(wrong_indexes_train)

My guess is df_train doesn't have a numerical index right now, rather one of the columns ['OverallQual' 'GrLivArea' 'GarageCars' 'TotalBsmtSF' 'FullBath' 'YearBuilt'] is serving as the index.

edited Sep 19 '19 at 15:50

Mehmet Hakan Kurtoğlu

85
1
9

answered Dec 21 '17 at 21:42

Jeff Otieno

31
1

Please consider editing this to add in code blocks, it would greatly improve readability. – Thomas Smyth - Treliant Dec 21 '17 at 21:57
index refers to rows, not columns. – Nguai al Jan 11 '19 at 04:20

score 0 · Answer 4 · answered Mar 04 '22 at 10:51

One can use drop DataFrame.drop for that.

Considering that one wants to drop the rows, one should use axis=0 or axis='index'. If one wants to drop columns, axis=1 or axis='columns'.

For your specific case, one can do

wrong_indexes_train = [0, 63, 151, 469, 1008]

df_train.drop(wrong_indexes_train, axis=0, inplace=True)

or

df_train.drop(df_train[[0, 63, 151, 469, 1008]], axis=0, inplace=True)

One can also select the rows with DataFrame.index

wrong_indexes_train = df_train.index[[0, 63, 151, 469, 1008]]

df_train.drop(wrong_indexes_train, inplace=True)

On another hand, and assuming that one's dataframe and the rows to drop are considerably big, one might want to consider selecting the rows to keep (as Dennis Golomazov suggests here). For that, one might use Mad Physicist's approach:

import numpy as np

wrong_indexes_train = [0, 63, 151, 469, 1008]

mask = np.ones((len(df_train), bool)

mask = df_train.iloc[wrong_indexes_train] = False

df_train_new = df_train.iloc[mask]

Drop rows by index from dataframe

4 Answers4

Linked