How to select all columns except one in pandas?

Question

I have a dataframe that look like this:

          a         b         c         d
0  0.418762  0.042369  0.869203  0.972314
1  0.991058  0.510228  0.594784  0.534366
2  0.407472  0.259811  0.396664  0.894202
3  0.726168  0.139531  0.324932  0.906575

How I can get all columns except b?

@cs95 -- The currently listed duplicate target isn't a duplicate. Despite the original title, the linked question is "Why doesn't this specific syntax work", whereas this question is a more general "What is the best way to do this". -- Add to this the difference between deleting a column from an existing DataFrame versus creating a new DataFrame with all-but-one of the columns of another. — R.M., May 21 '19 at 19:30
@R.M. I'm sorry but I don't agree with the edit you've made to the title on that post, so I've rolled it back. It's true that the intent of the OP was to question the syntax, but the post has grown to address the more broad question of how to delete a column. The answers in this post are carbon copies of the highest upvoted post there. The dupe stays. — cs95, May 21 '19 at 19:46
Note this question is being discussed on [Meta](https://meta.stackoverflow.com/q/385291/215552). — Heretic Monkey, May 21 '19 at 21:24

score 760 · Accepted Answer · edited Nov 20 '19 at 19:15

760

When the columns are not a MultiIndex, df.columns is just an array of column names so you can do:

df.loc[:, df.columns != 'b']

          a         c         d
0  0.561196  0.013768  0.772827
1  0.882641  0.615396  0.075381
2  0.368824  0.651378  0.397203
3  0.788730  0.568099  0.869127

edited Nov 20 '19 at 19:15

Will

4,241
4
39
48

answered Apr 21 '15 at 05:27

Marius

58,213
16
107
105

31

Not bad, but @mike's solution using `drop` is better IMO. A bit more readable and handles multiindexes – travc Jun 30 '17 at 00:24
7

I actually agree that @mike's solution using `drop` is better - I do think it's useful to discover that (single-level) columns are arrays you can work with, but specifically for dropping a column, `drop` is very readable and works well with complex indexes. – Marius Apr 23 '19 at 23:06
1

Thank you for this greate answer. what if I don't have a header ? how do I adrress ? – FabioSpaghetti Sep 17 '19 at 14:03
3

What about when you have more than 1 column to be ignored? – Bruno Ambrozio Mar 26 '20 at 09:06
1

@Marius Does this work with multiple columns (say two)? – MasayoMusic Jun 23 '20 at 22:58
The trial way is ```python df.loc[:, (df.columns != 'b') & (df.columns != 'c')] ``` However, this grows too fast. We also can do ```python from functools import reduce df.loc[:, reduce(lambda x, y: x & (df.columns != y), ["b", "c"], df.columns != "z")] ``` and add as much columns we like (assuming we have no column 'z'). But, at this point, it is better to use `.drop()`. – an_drade Aug 10 '20 at 18:36

score 415 · Answer 2 · edited Jul 07 '23 at 18:29

415

Don't use ix. It's deprecated. The most readable and idiomatic way of doing this is df.drop():

>>> df.drop('b', axis=1)
          a         c         d
0  0.418762  0.869203  0.972314
1  0.991058  0.594784  0.534366
2  0.407472  0.396664  0.894202
3  0.726168  0.324932  0.906575

Note that by default, .drop() does not operate inplace; despite the ominous name, df is unharmed by this process. If you want to permanently remove b from df, do df.drop('b', inplace=True).

df.drop() also accepts a list of labels, e.g. df.drop(['a', 'b'], axis=1) will drop column a and b.

edited Jul 07 '23 at 18:29

wjandrea

28,235
9
60
81

answered Jun 09 '16 at 05:38

mike

4,901
2
19
19

2

Also works on a multiindex just like you'd expect it to. `df.drop([('l1name', 'l2name'), 'anotherl1name'], axis=1)`. Seems to use list vs tuple to determine if you want multiple columns (list) or referring to a multiindex (tuple). – travc Jun 30 '17 at 00:20
58

More readable: `df.drop(columns='a')` or `df.drop(columns=['a', 'b'])`. Can also replace `columns=` with `index=`. – BallpointBen May 09 '18 at 13:52
However this is not useful if you happen *not* to know the names of all the columns you want to drop. – yeliabsalohcin Sep 04 '18 at 16:17
5

Since this creates a copy and not a view/reference, you cannot modify the original dataframe by using this on the LHS of an assignment. – Jan Christoph Terasa Apr 15 '20 at 18:58
@JanChristophTerasa Do you happen to know how to modify these selected columns within original df (such multiply all these columns with values of another column). If I modify these values I would need to tack on the dropped column on the end which doesn't seem to be best way. – MasayoMusic Jun 23 '20 at 22:48
@MasayoMusic The solutions using `df.loc` or masking should be able to modify the DataFrame in place. – Jan Christoph Terasa Jun 24 '20 at 18:48

score 217 · Answer 3 · answered Aug 28 '16 at 14:05

217

df[df.columns.difference(['b'])]

Out: 
          a         c         d
0  0.427809  0.459807  0.333869
1  0.678031  0.668346  0.645951
2  0.996573  0.673730  0.314911
3  0.786942  0.719665  0.330833

answered Aug 28 '16 at 14:05

ayhan

70,170
20
182
203

16

I like this approach as it can be used to omit more than one column. – Nischal Hp Aug 30 '17 at 09:42
3

@NischalHp df.drop can also omit more than one column df.drop(['a', 'b'], axis=1) – JACKY88 Apr 25 '19 at 13:02
5

I think it's worth noting that this can re-arrange your columns – ocean800 Nov 18 '19 at 06:33
5

@ocean800 Yes that's true. You can pass `sort=False` if you want to avoid that behaviour (`df.columns.difference(['b'], sort=False)`) – ayhan Nov 18 '19 at 06:44
This is the top one that works on a `DataFrameGroupBy`, which is what I was looking for, thanks! I used `grouped[df.columns.difference(['b'])]...` – wjandrea Jun 04 '22 at 15:20
This approach is good because it can be used with several columns and inside of a loc, if you want to update all columns except 2 for example. Moreover, it's very easy to use and the syntax is clear. – Haeden Feb 01 '23 at 12:21

score 146 · Answer 4 · answered Oct 31 '18 at 01:47

146

You can use df.columns.isin()

df.loc[:, ~df.columns.isin(['b'])]

When you want to drop multiple columns, as simple as:

df.loc[:, ~df.columns.isin(['col1', 'col2'])]

answered Oct 31 '18 at 01:47

William

4,258
2
23
20

This method was helpful to modify the selected columns! – Derek O May 02 '21 at 06:16

Mykola Zotko · Answer 5 · 2022-10-19T04:58:17.053

36

You can drop columns in index:

df[df.columns.drop('b')]

or

df.loc[:, df.columns.drop('b')]

edited Oct 19 '22 at 04:58

answered Jan 17 '21 at 13:21

Mykola Zotko

15,583
3
71
73

score 15 · Answer 6 · answered Aug 18 '16 at 03:22

15

Here is another way:

df[[i for i in list(df.columns) if i != '<your column>']]

You just pass all columns to be shown except of the one you do not want.

answered Aug 18 '16 at 03:22

Salvador Dali

214,103
147
703
753

score 9 · Answer 7 · answered Sep 02 '19 at 14:57

Here is a one line lambda:

df[map(lambda x :x not in ['b'], list(df.columns))]

before:

import pandas
import numpy as np
df = pd.DataFrame(np.random.rand(4,4), columns = list('abcd'))
df

       a           b           c           d
0   0.774951    0.079351    0.118437    0.735799
1   0.615547    0.203062    0.437672    0.912781
2   0.804140    0.708514    0.156943    0.104416
3   0.226051    0.641862    0.739839    0.434230

after:

df[map(lambda x :x not in ['b'], list(df.columns))]

        a          c          d
0   0.774951    0.118437    0.735799
1   0.615547    0.437672    0.912781
2   0.804140    0.156943    0.104416
3   0.226051    0.739839    0.434230

score 7 · Answer 8 · edited Apr 21 '18 at 14:39

I think the best way to do is the way mentioned by @Salvador Dali. Not that the others are wrong.

Because when you have a data set where you just want to select one column and put it into one variable and the rest of the columns into another for comparison or computational purposes. Then dropping the column of the data set might not help. Of course there are use cases for that as well.

x_cols = [x for x in data.columns if x != 'name of column to be excluded']

Then you can put those collection of columns in variable x_cols into another variable like x_cols1 for other computation.

ex: x_cols1 = data[x_cols]

Can you explain why this is a separate answer instead of a comment / extension to Salvador's answer? — , Nov 06 '19 at 09:10

score 7 · Answer 9 · answered Sep 23 '18 at 14:29

7

Another slight modification to @Salvador Dali enables a list of columns to exclude:

df[[i for i in list(df.columns) if i not in [list_of_columns_to_exclude]]]

or

df.loc[:,[i for i in list(df.columns) if i not in [list_of_columns_to_exclude]]]

answered Sep 23 '18 at 14:29

user1718097

4,090
11
48
63

score 2 · Answer 10 · answered Oct 08 '22 at 19:34

I've tested speed and found that for me the .loc solution was the fastest

df_working_1.loc[:, df_working_1.columns != "market_id"] 
# 7.19 ms ± 201 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

df_working_1.drop("market_id", axis=1)
# 7.65 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

df_working_1[df_working_1.columns.difference(['market_id'])]
# 7.58 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

df_working_1[[i for i in list(df_working_1.columns) if i != 'market_id']]
# 7.57 ms ± 144 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

score 1 · Answer 11 · answered Sep 10 '20 at 13:23

1

I think a nice solution is with the function filter of pandas and regex (match everything except "b"):

df.filter(regex="^(?!b$)")

answered Sep 10 '20 at 13:23

Billy Bonaros

1,671
11
18

score 1 · Answer 12 · answered Nov 08 '21 at 07:47

1

Similar to @Toms answer, it is also possible to select all columns except "b" without using .loc, like so:

df[df.columns[~df.columns.isin(['b'])]]

answered Nov 08 '21 at 07:47

DataBach

1,330
2
16
31

cottontail · Answer 13 · 2023-04-12T04:11:37.427

You can also pop() a column. It removes a column from a dataframe but returns it as a Series, which you assign to a value (y below). If you don't assign, it's just thrown away. One case where this is quite useful is to separate the target variable from the feature set in ML. For example:

X = pd.DataFrame({'feature1': range(5), 'feature2': range(6,11), 'target': [0,0,0,1,1]})
y = X.pop('target')

It makes the following transformation:

score 0 · Answer 14 · answered May 12 '23 at 06:23

This allows you to drop multiple columns even if you aren't sure they exist, and works for MultiIndex columns too.

df.drop(columns=[x for x in ('abc', ('foo', 'bar')) if x in df.columns])

In this example (assuming a 2-level MultiIndex) it will drop all columns with abc in the first level, and it will also drop the single column ('foo', 'bar')

I've added this answer as this is the first question that appears even when searching for MultiIndex.

How to select all columns except one in pandas?

14 Answers14

Linked

Related