155

Let's say I have a DataFrame that looks like this:

a  b  c  d  e  f  g  
1  2  3  4  5  6  7
4  3  7  1  6  9  4
8  9  0  2  4  2  1

How would I go about deleting every column besides a and b?

This would result in:

a  b
1  2
4  3
8  9

I would like a way to delete these using a simple line of code that says, delete all columns besides a and b, because let's say hypothetically I have 1000 columns of data.

Thank you.

sgerbhctim
  • 3,420
  • 7
  • 38
  • 60

6 Answers6

148
In [48]: df.drop(df.columns.difference(['a','b']), 1, inplace=True)
Out[48]:
   a  b
0  1  2
1  4  3
2  8  9

or:

In [55]: df = df.loc[:, df.columns.intersection(['a','b'])]

In [56]: df
Out[56]:
   a  b
0  1  2
1  4  3
2  8  9

PS please be aware that the most idiomatic Pandas way to do that was already proposed by @Wen:

df = df[['a','b']]

or

df = df.loc[:, ['a','b']]
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • 2
    is there any difference in terms of performance or something when you use inplace, a more general question would be df.drop(...inplace=True) vs df = df[...] any suggestions – PirateApp May 06 '18 at 12:10
  • 3
    @PirateApp, there might be some minimal difference. I would recommend you to read [this answer from one of the main Pandas developers Jeff and comments under that answer...](https://stackoverflow.com/questions/22532302/pandas-peculiar-performance-drop-for-inplace-rename-after-dropna/22533110#22533110) – MaxU - stand with Ukraine May 06 '18 at 12:22
  • @MaxU I think `df.loc` is way faster than any other method like `drop` or `filter` for processing millions of records. Please correct me if I'm wrong. – Abu Shoeb Sep 09 '20 at 20:46
97

Another option to add to the mix. I prefer this approach for readability.

df = df.filter(['a', 'b'])

Where the first positional argument is items=[]


Bonus

You can also use a like argument or regex to filter.
Helpful if you have a set of columns like ['a_1','a_2','b_1','b_2']

You can do

df = df.filter(like='b_')

and end up with ['b_1','b_2']

Pandas documentation for filter.

Kermit
  • 4,922
  • 4
  • 42
  • 74
GollyJer
  • 23,857
  • 16
  • 106
  • 174
56

there are multiple solution .

df = df[['a','b']] #1

df = df[list('ab')] #2

df = df.loc[:,df.columns.isin(['a','b'])] #3

df = pd.DataFrame(data=df.eval('a,b').T,columns=['a','b']) #4 PS:I do not recommend this method , but still a way to achieve this 
Sunny Patel
  • 7,830
  • 2
  • 31
  • 46
BENY
  • 317,841
  • 20
  • 164
  • 234
5

Hey what you are looking for is:

df = df[["a","b"]]

You will recive a dataframe which only contains the columns a and b

Blowsh1t
  • 109
  • 1
  • 4
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – lemon Jun 02 '22 at 14:14
  • This is no different than @BENY's [answer](https://stackoverflow.com/a/45846274/8285811)'s first option. – Akaisteph7 Sep 19 '22 at 16:47
3

If you only want to keep more columns than you're dropping put a "~" before the .isin statement to select every column except the ones you want:

df = df.loc[:, ~df.columns.isin(['a','b'])]
Asclepius
  • 57,944
  • 17
  • 167
  • 143
2

If you have more than two columns that you want to drop, let's say 20 or 30, you can use lists as well. Make sure that you also specify the axis value.

drop_list = ["a","b"]
df = df.drop(df.columns.difference(drop_list), axis=1)
Taie
  • 1,021
  • 16
  • 29