How to delete all columns in DataFrame except certain ones?

Question

Let's say I have a DataFrame that looks like this:

a  b  c  d  e  f  g  
1  2  3  4  5  6  7
4  3  7  1  6  9  4
8  9  0  2  4  2  1

How would I go about deleting every column besides a and b?

This would result in:

I would like a way to delete these using a simple line of code that says, delete all columns besides a and b, because let's say hypothetically I have 1000 columns of data.

Thank you.

MaxU - stand with Ukraine · Answer 1 · 2017-08-23T17:56:11.663

148

In [48]: df.drop(df.columns.difference(['a','b']), 1, inplace=True)
Out[48]:
   a  b
0  1  2
1  4  3
2  8  9

or:

In [55]: df = df.loc[:, df.columns.intersection(['a','b'])]

In [56]: df
Out[56]:
   a  b
0  1  2
1  4  3
2  8  9

PS please be aware that the most idiomatic Pandas way to do that was already proposed by @Wen:

df = df[['a','b']]

or

df = df.loc[:, ['a','b']]

edited Aug 23 '17 at 17:56

answered Aug 23 '17 at 17:47

MaxU - stand with Ukraine

205,989
36
386
419

2

is there any difference in terms of performance or something when you use inplace, a more general question would be df.drop(...inplace=True) vs df = df[...] any suggestions – PirateApp May 06 '18 at 12:10
3

@PirateApp, there might be some minimal difference. I would recommend you to read [this answer from one of the main Pandas developers Jeff and comments under that answer...](https://stackoverflow.com/questions/22532302/pandas-peculiar-performance-drop-for-inplace-rename-after-dropna/22533110#22533110) – MaxU - stand with Ukraine May 06 '18 at 12:22
@MaxU I think `df.loc` is way faster than any other method like `drop` or `filter` for processing millions of records. Please correct me if I'm wrong. – Abu Shoeb Sep 09 '20 at 20:46

score 97 · Answer 2 · edited Oct 02 '20 at 02:01

97

Another option to add to the mix. I prefer this approach for readability.

df = df.filter(['a', 'b'])

Where the first positional argument is items=[]

Bonus

You can also use a like argument or regex to filter.
Helpful if you have a set of columns like ['a_1','a_2','b_1','b_2']

You can do

df = df.filter(like='b_')

and end up with ['b_1','b_2']

Pandas documentation for filter.

edited Oct 02 '20 at 02:01

Kermit

4,922
4
42
74

answered Nov 08 '18 at 19:18

GollyJer

23,857
16
106
174

score 56 · Answer 3 · edited Jul 15 '20 at 04:32

56

there are multiple solution .

df = df[['a','b']] #1

df = df[list('ab')] #2

df = df.loc[:,df.columns.isin(['a','b'])] #3

df = pd.DataFrame(data=df.eval('a,b').T,columns=['a','b']) #4 PS:I do not recommend this method , but still a way to achieve this

edited Jul 15 '20 at 04:32

Sunny Patel

7,830
2
31
46

answered Aug 23 '17 at 17:45

BENY

317,841
20
164
234

7

`df = df[['a','b']]` if you looking to work on subset. – Zero Aug 23 '17 at 17:50

Blowsh1t · Answer 4 · 2022-06-07T11:53:59.857

5

Hey what you are looking for is:

df = df[["a","b"]]

You will recive a dataframe which only contains the columns a and b

edited Jun 07 '22 at 11:53

answered Jun 02 '22 at 10:40

Blowsh1t

109
1
4

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – lemon Jun 02 '22 at 14:14
This is no different than @BENY's [answer](https://stackoverflow.com/a/45846274/8285811)'s first option. – Akaisteph7 Sep 19 '22 at 16:47

score 3 · Answer 5 · edited Sep 24 '18 at 01:44

3

If you only want to keep more columns than you're dropping put a "~" before the .isin statement to select every column except the ones you want:

df = df.loc[:, ~df.columns.isin(['a','b'])]

edited Sep 24 '18 at 01:44

Asclepius

57,944
17
167
143

answered Aug 23 '17 at 18:17

Isaac Taylor

41
1

This is the only answer. "drop columns except". Thanks, I was looking for this. – Natacha Oct 29 '20 at 19:28

score 2 · Answer 6 · answered Dec 18 '19 at 10:04

2

If you have more than two columns that you want to drop, let's say 20 or 30, you can use lists as well. Make sure that you also specify the axis value.

drop_list = ["a","b"]
df = df.drop(df.columns.difference(drop_list), axis=1)

answered Dec 18 '19 at 10:04

Taie

1,021
16
29

How to delete all columns in DataFrame except certain ones?

6 Answers6

Bonus

Linked

Related