How to drop a list of rows from Pandas dataframe?

Question

I have a dataframe df :

>>> df
                  sales  discount  net_sales    cogs
STK_ID RPT_Date                                     
600141 20060331   2.709       NaN      2.709   2.245
       20060630   6.590       NaN      6.590   5.291
       20060930  10.103       NaN     10.103   7.981
       20061231  15.915       NaN     15.915  12.686
       20070331   3.196       NaN      3.196   2.710
       20070630   7.907       NaN      7.907   6.459

Then I want to drop rows with certain sequence numbers which indicated in a list, suppose here is [1,2,4], then left:

                  sales  discount  net_sales    cogs
STK_ID RPT_Date                                     
600141 20060331   2.709       NaN      2.709   2.245
       20061231  15.915       NaN     15.915  12.686
       20070630   7.907       NaN      7.907   6.459

How or what function can do that ?

just to clarify, this question is about dropping rows with specific index values.. their use of [1,2,4] is to point to the rows *left over* after dropping. There are answers below that do this. — alchemy, Apr 22 '20 at 18:36

score 515 · Accepted Answer · edited Dec 13 '22 at 02:05

515

Use DataFrame.drop and pass it a Series of index labels:

In [65]: df
Out[65]: 
       one  two
one      1    4
two      2    3
three    3    2
four     4    1
    
    
In [66]: df.drop(index=[1,3])
Out[66]: 
       one  two
one      1    4
three    3    2

edited Dec 13 '22 at 02:05

Community

1
1

answered Feb 02 '13 at 12:11

tzelleke

15,023
5
33
49

27

+1 In addition, Dropping the last row df.drop(df.tail(1).index) – Nasser Al-Wohaibi Feb 26 '14 at 20:55
27

This answer only works if df.index.unique() is the same as df.index, which is not a requirement for a Pandas DataFrame. Does anyone have a solution when df.index values are not guaranteed to be unique? – J Jones Jun 29 '16 at 16:38
3

this doesnt allow you to index on the index name itself – ingrid Nov 02 '16 at 20:33
67

Folks, in examples, if you want to be clear, please don't use the same strings for rows and columns. That's fine for those who really know their stuff already. Frustrating for those trying to learn. – gseattle Mar 19 '17 at 07:40
2

how can you do this with a range of rows? say from row 0 to row n. – mezzanaccio Jun 05 '18 at 16:27
5

newcomers to python: note that if you want to drop these rows and save them in the same dataframe (inplace) you also need to add the `axis=0` (0 = rows, 1 = columns) and `inplace=True` as in `df.drop(df.index[[1,3]], axis=0, inplace=True)`. @mezzanaccio, if you specifically know which indexes you want to replace (and also using your 0 to n example):`df.drop(df.index[range(0, n)], axis=0, inplace=True)` – mrbTT Aug 02 '18 at 20:02
@YuryBayda `ix` has been [deprecated, see this for a replacement](https://stackoverflow.com/questions/43838999/pandas-replacement-for-ix) – Connor Mar 24 '19 at 19:50
1

@JJones In this case, I find that I can achieve the effect if I reset_index() to change the index to the default integer index, then drop() the relevant row indexes, then set_index() to change the index back to the original index column. I can't find any documentation that the default integer index is [0, 1, ...] or that it's unique, though. – Joshua Chia Nov 21 '19 at 06:17
Am I wrong or does the index start with 0 as in danielhadar's answer below `df.drop(df.index[0])`, which would mean this answer would drop the second and forth rows? Ah, yes, it does, and that is what the answer shows.. right. Not the clearest example, but correct. I would have used maybe [1,2] to point this out. – alchemy Apr 22 '20 at 18:40
It gives me a warning SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame – Falco Peregrinus Mar 02 '22 at 20:25

score 155 · Answer 2 · answered Jan 05 '16 at 14:28

Note that it may be important to use the "inplace" command when you want to do the drop in line.

df.drop(df.index[[1,3]], inplace=True)

Because your original question is not returning anything, this command should be used. http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.drop.html

score 76 · Answer 3 · answered Apr 15 '17 at 01:57

76

If the DataFrame is huge, and the number of rows to drop is large as well, then simple drop by index df.drop(df.index[]) takes too much time.

In my case, I have a multi-indexed DataFrame of floats with 100M rows x 3 cols, and I need to remove 10k rows from it. The fastest method I found is, quite counterintuitively, to take the remaining rows.

Let indexes_to_drop be an array of positional indexes to drop ([1, 2, 4] in the question).

indexes_to_keep = set(range(df.shape[0])) - set(indexes_to_drop)
df_sliced = df.take(list(indexes_to_keep))

In my case this took 20.5s, while the simple df.drop took 5min 27s and consumed a lot of memory. The resulting DataFrame is the same.

answered Apr 15 '17 at 01:57

Dennis Golomazov

16,269
5
73
81

1

Wouldn't it be cheaper to just negate a mask rather than creating a set? Something like `m = np.ones(len(df), bool); m[indices_to_drop] = False`? – Mad Physicist May 17 '21 at 14:13
@MadPhysicist that should probably be more efficient, thanks! – Dennis Golomazov Aug 12 '22 at 21:43
working on 50m+ rows. this is really fast ~2mins on the Fargate container. – Ali Berat Çetin Jul 07 '23 at 15:23

score 51 · Answer 4 · edited Aug 18 '20 at 10:52

51

I solved this in a simpler way - just in 2 steps.

Make a dataframe with unwanted rows/data.
Use the index of this unwanted dataframe to drop the rows from the original dataframe.

Example:
Suppose you have a dataframe df which as many columns including 'Age' which is an integer. Now let's say you want to drop all the rows with 'Age' as negative number.

df_age_negative = df[ df['Age'] < 0 ] # Step 1
df = df.drop(df_age_negative.index, axis=0) # Step 2

Hope this is much simpler and helps you.

edited Aug 18 '20 at 10:52

Nauman Naeem

408
3
12

answered Dec 28 '17 at 07:05

Krishnaprasad Challuru

619
5
2

4

+1, this is the only answer that tells you how to remove a row selecting a column different from the first one. – Alejo Bernardin May 03 '20 at 05:31
2

This is the answer which I was looking. Thanks Krishnaprasad garu – codingbruh Sep 29 '20 at 10:55
Note that this can produce incorrect results if the index contains duplicate values – Joe Jan 04 '23 at 17:46

danielhadar · Answer 5 · 2017-09-24T10:17:21.770

48

You can also pass to DataFrame.drop the label itself (instead of Series of index labels):

In[17]: df
Out[17]: 
            a         b         c         d         e
one  0.456558 -2.536432  0.216279 -1.305855 -0.121635
two -1.015127 -0.445133  1.867681  2.179392  0.518801

In[18]: df.drop('one')
Out[18]: 
            a         b         c         d         e
two -1.015127 -0.445133  1.867681  2.179392  0.518801

Which is equivalent to:

In[19]: df.drop(df.index[[0]])
Out[19]: 
            a         b         c         d         e
two -1.015127 -0.445133  1.867681  2.179392  0.518801

edited Sep 24 '17 at 10:17

answered May 08 '16 at 08:28

danielhadar

2,031
1
16
27

1

df.drop(df.index[0]) also works. i mean, no need for double square_brackets (with pandas 0.18.1, at least) – tagoma Dec 14 '16 at 12:51

score 17 · Answer 6 · edited Dec 18 '17 at 00:53

17

If I want to drop a row which has let's say index x, I would do the following:

df = df[df.index != x]

If I would want to drop multiple indices (say these indices are in the list unwanted_indices), I would do:

desired_indices = [i for i in len(df.index) if i not in unwanted_indices]
desired_df = df.iloc[desired_indices]

edited Dec 18 '17 at 00:53

Nikos Tavoularis

2,843
1
30
27

answered Nov 19 '17 at 19:19

Divyansh

339
3
3

This works for what I wanted, thanks! Drop all rows except index X. df = df[df.index == 'x'] – Chris Norris Jul 27 '20 at 00:07

Ozkan Serttas · Answer 7 · 2019-04-18T02:47:30.710

12

Here is a bit specific example, I would like to show. Say you have many duplicate entries in some of your rows. If you have string entries you could easily use string methods to find all indexes to drop.

ind_drop = df[df['column_of_strings'].apply(lambda x: x.startswith('Keyword'))].index

And now to drop those rows using their indexes

new_df = df.drop(ind_drop)

edited Apr 18 '19 at 02:47

answered Jan 10 '19 at 05:50

Ozkan Serttas

947
13
14

score 8 · Answer 8 · answered Oct 14 '19 at 05:44

8

Use only the Index arg to drop row:-

df.drop(index = 2, inplace = True)

For multiple rows:-

df.drop(index=[1,3], inplace = True)

answered Oct 14 '19 at 05:44

kamran kausar

4,117
1
23
17

mepstein · Answer 9 · 2016-12-22T20:50:31.617

In a comment to @theodros-zelleke's answer, @j-jones asked about what to do if the index is not unique. I had to deal with such a situation. What I did was to rename the duplicates in the index before I called drop(), a la:

dropped_indexes = <determine-indexes-to-drop>
df.index = rename_duplicates(df.index)
df.drop(df.index[dropped_indexes], inplace=True)

where rename_duplicates() is a function I defined that went through the elements of index and renamed the duplicates. I used the same renaming pattern as pd.read_csv() uses on columns, i.e., "%s.%d" % (name, count), where name is the name of the row and count is how many times it has occurred previously.

score 3 · Answer 10 · answered Apr 17 '19 at 05:42

Determining the index from the boolean as described above e.g.

df[df['column'].isin(values)].index

can be more memory intensive than determining the index using this method

pd.Index(np.where(df['column'].isin(values))[0])

applied like so

df.drop(pd.Index(np.where(df['column'].isin(values))[0]), inplace = True)

This method is useful when dealing with large dataframes and limited memory.

score 3 · Answer 11 · answered Jan 17 '21 at 13:49

3

To drop rows with indices 1, 2, 4 you can use:

df[~df.index.isin([1, 2, 4])]

The tilde operator ~ negates the result of the method isin. Another option is to drop indices:

df.loc[df.index.drop([1, 2, 4])]

answered Jan 17 '21 at 13:49

Mykola Zotko

15,583
3
71
73

score 2 · Answer 12 · edited Nov 26 '20 at 07:08

Look at the following dataframe df

df

   column1  column2  column3
0        1       11       21
1        2       12       22
2        3       13       23
3        4       14       24
4        5       15       25
5        6       16       26
6        7       17       27
7        8       18       28
8        9       19       29
9       10       20       30

Lets drop all the rows which has an odd number in column1

Create a list of all the elements in column1 and keep only those elements that are even numbers (the elements that you dont want to drop)

keep_elements = [x for x in df.column1 if x%2==0]

All the rows with the values [2, 4, 6, 8, 10] in its column1 will be retained or not dropped.

df.set_index('column1',inplace = True)
df.drop(df.index.difference(keep_elements),axis=0,inplace=True)
df.reset_index(inplace=True)

We make the column1 as index and drop all the rows that are not required. Then we reset the index back. df

   column1  column2  column3
0        2       12       22
1        4       14       24
2        6       16       26
3        8       18       28
4       10       20       30

score 2 · Answer 13 · answered May 17 '21 at 15:15

As Dennis Golomazov's answer suggests, using drop to drop rows. You can select to keep rows instead. Let's say you have a list of row indices to drop called indices_to_drop. You can convert it to a mask as follows:

mask = np.ones(len(df), bool)
mask[indices_to_drop] = False

You can use this index directly:

df_new = df.iloc[mask]

The nice thing about this method is that mask can come from any source: it can be a condition involving many columns, or something else.

The really nice thing is, you really don't need the index of the original DataFrame at all, so it doesn't matter if the index is unique or not.

The disadvantage is of course that you can't do the drop in-place with this method.

score 0 · Answer 14 · answered Dec 26 '19 at 03:37

Consider an example dataframe

df =     
index    column1
0           00
1           10
2           20
3           30

we want to drop 2nd and 3rd index rows.

Approach 1:

df = df.drop(df.index[2,3])
 or 
df.drop(df.index[2,3],inplace=True)
print(df)

df =     
index    column1
0           00
3           30

 #This approach removes the rows as we wanted but the index remains unordered

Approach 2

df.drop(df.index[2,3],inplace=True,ignore_index=True)
print(df)
df =     
index    column1
0           00
1           30
#This approach removes the rows as we wanted and resets the index.

score 0 · Answer 15 · answered Oct 03 '22 at 19:36

0

This worked for me

# Create a list containing the index numbers you want to remove
index_list = list(range(42766, 42798))
df.drop(df.index[index_list], inplace =True)
df.shape

This should drop all indexes within that created range

answered Oct 03 '22 at 19:36

The_Data_Guy

123
1
4

How to drop a list of rows from Pandas dataframe?

15 Answers15

Linked

Related