549

I have a dataframe that look like this:

          a         b         c         d
0  0.418762  0.042369  0.869203  0.972314
1  0.991058  0.510228  0.594784  0.534366
2  0.407472  0.259811  0.396664  0.894202
3  0.726168  0.139531  0.324932  0.906575

How I can get all columns except b?

wjandrea
  • 28,235
  • 9
  • 60
  • 81
markov zain
  • 11,987
  • 13
  • 35
  • 39
  • @cs95 -- The currently listed duplicate target isn't a duplicate. Despite the original title, the linked question is "Why doesn't this specific syntax work", whereas this question is a more general "What is the best way to do this". -- Add to this the difference between deleting a column from an existing DataFrame versus creating a new DataFrame with all-but-one of the columns of another. – R.M. May 21 '19 at 19:30
  • @R.M. I'm sorry but I don't agree with the edit you've made to the title on that post, so I've rolled it back. It's true that the intent of the OP was to question the syntax, but the post has grown to address the more broad question of how to delete a column. The answers in this post are carbon copies of the highest upvoted post there. The dupe stays. – cs95 May 21 '19 at 19:46
  • Note this question is being discussed on [Meta](https://meta.stackoverflow.com/q/385291/215552). – Heretic Monkey May 21 '19 at 21:24

14 Answers14

760

When the columns are not a MultiIndex, df.columns is just an array of column names so you can do:

df.loc[:, df.columns != 'b']

          a         c         d
0  0.561196  0.013768  0.772827
1  0.882641  0.615396  0.075381
2  0.368824  0.651378  0.397203
3  0.788730  0.568099  0.869127
Will
  • 4,241
  • 4
  • 39
  • 48
Marius
  • 58,213
  • 16
  • 107
  • 105
  • 31
    Not bad, but @mike's solution using `drop` is better IMO. A bit more readable and handles multiindexes – travc Jun 30 '17 at 00:24
  • 7
    I actually agree that @mike's solution using `drop` is better - I do think it's useful to discover that (single-level) columns are arrays you can work with, but specifically for dropping a column, `drop` is very readable and works well with complex indexes. – Marius Apr 23 '19 at 23:06
  • 1
    Thank you for this greate answer. what if I don't have a header ? how do I adrress ? – FabioSpaghetti Sep 17 '19 at 14:03
  • 3
    What about when you have more than 1 column to be ignored? – Bruno Ambrozio Mar 26 '20 at 09:06
  • 1
    @Marius Does this work with multiple columns (say two)? – MasayoMusic Jun 23 '20 at 22:58
  • The trial way is ```python df.loc[:, (df.columns != 'b') & (df.columns != 'c')] ``` However, this grows too fast. We also can do ```python from functools import reduce df.loc[:, reduce(lambda x, y: x & (df.columns != y), ["b", "c"], df.columns != "z")] ``` and add as much columns we like (assuming we have no column 'z'). But, at this point, it is better to use `.drop()`. – an_drade Aug 10 '20 at 18:36
415

Don't use ix. It's deprecated. The most readable and idiomatic way of doing this is df.drop():

>>> df.drop('b', axis=1)
          a         c         d
0  0.418762  0.869203  0.972314
1  0.991058  0.594784  0.534366
2  0.407472  0.396664  0.894202
3  0.726168  0.324932  0.906575

Note that by default, .drop() does not operate inplace; despite the ominous name, df is unharmed by this process. If you want to permanently remove b from df, do df.drop('b', inplace=True).

df.drop() also accepts a list of labels, e.g. df.drop(['a', 'b'], axis=1) will drop column a and b.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
mike
  • 4,901
  • 2
  • 19
  • 19
  • 2
    Also works on a multiindex just like you'd expect it to. `df.drop([('l1name', 'l2name'), 'anotherl1name'], axis=1)`. Seems to use list vs tuple to determine if you want multiple columns (list) or referring to a multiindex (tuple). – travc Jun 30 '17 at 00:20
  • 58
    More readable: `df.drop(columns='a')` or `df.drop(columns=['a', 'b'])`. Can also replace `columns=` with `index=`. – BallpointBen May 09 '18 at 13:52
  • However this is not useful if you happen *not* to know the names of all the columns you want to drop. – yeliabsalohcin Sep 04 '18 at 16:17
  • 5
    Since this creates a copy and not a view/reference, you cannot modify the original dataframe by using this on the LHS of an assignment. – Jan Christoph Terasa Apr 15 '20 at 18:58
  • @JanChristophTerasa Do you happen to know how to modify these selected columns within original df (such multiply all these columns with values of another column). If I modify these values I would need to tack on the dropped column on the end which doesn't seem to be best way. – MasayoMusic Jun 23 '20 at 22:48
  • @MasayoMusic The solutions using `df.loc` or masking should be able to modify the DataFrame in place. – Jan Christoph Terasa Jun 24 '20 at 18:48
217
df[df.columns.difference(['b'])]

Out: 
          a         c         d
0  0.427809  0.459807  0.333869
1  0.678031  0.668346  0.645951
2  0.996573  0.673730  0.314911
3  0.786942  0.719665  0.330833
ayhan
  • 70,170
  • 20
  • 182
  • 203
  • 16
    I like this approach as it can be used to omit more than one column. – Nischal Hp Aug 30 '17 at 09:42
  • 3
    @NischalHp df.drop can also omit more than one column df.drop(['a', 'b'], axis=1) – JACKY88 Apr 25 '19 at 13:02
  • 5
    I think it's worth noting that this can re-arrange your columns – ocean800 Nov 18 '19 at 06:33
  • 5
    @ocean800 Yes that's true. You can pass `sort=False` if you want to avoid that behaviour (`df.columns.difference(['b'], sort=False)`) – ayhan Nov 18 '19 at 06:44
  • This is the top one that works on a `DataFrameGroupBy`, which is what I was looking for, thanks! I used `grouped[df.columns.difference(['b'])]...` – wjandrea Jun 04 '22 at 15:20
  • This approach is good because it can be used with several columns and inside of a loc, if you want to update all columns except 2 for example. Moreover, it's very easy to use and the syntax is clear. – Haeden Feb 01 '23 at 12:21
146

You can use df.columns.isin()

df.loc[:, ~df.columns.isin(['b'])]

When you want to drop multiple columns, as simple as:

df.loc[:, ~df.columns.isin(['col1', 'col2'])]
William
  • 4,258
  • 2
  • 23
  • 20
36

You can drop columns in index:

df[df.columns.drop('b')]

or

df.loc[:, df.columns.drop('b')]
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73
15

Here is another way:

df[[i for i in list(df.columns) if i != '<your column>']]

You just pass all columns to be shown except of the one you do not want.

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
9

Here is a one line lambda:

df[map(lambda x :x not in ['b'], list(df.columns))]

before:

import pandas
import numpy as np
df = pd.DataFrame(np.random.rand(4,4), columns = list('abcd'))
df

       a           b           c           d
0   0.774951    0.079351    0.118437    0.735799
1   0.615547    0.203062    0.437672    0.912781
2   0.804140    0.708514    0.156943    0.104416
3   0.226051    0.641862    0.739839    0.434230

after:

df[map(lambda x :x not in ['b'], list(df.columns))]

        a          c          d
0   0.774951    0.118437    0.735799
1   0.615547    0.437672    0.912781
2   0.804140    0.156943    0.104416
3   0.226051    0.739839    0.434230
Grant Shannon
  • 4,709
  • 1
  • 46
  • 36
7

I think the best way to do is the way mentioned by @Salvador Dali. Not that the others are wrong.

Because when you have a data set where you just want to select one column and put it into one variable and the rest of the columns into another for comparison or computational purposes. Then dropping the column of the data set might not help. Of course there are use cases for that as well.

x_cols = [x for x in data.columns if x != 'name of column to be excluded']

Then you can put those collection of columns in variable x_cols into another variable like x_cols1 for other computation.

ex: x_cols1 = data[x_cols]
MRizwan33
  • 2,723
  • 6
  • 31
  • 42
Sudhi
  • 421
  • 1
  • 8
  • 19
  • Can you explain why this is a separate answer instead of a comment / extension to Salvador's answer? –  Nov 06 '19 at 09:10
7

Another slight modification to @Salvador Dali enables a list of columns to exclude:

df[[i for i in list(df.columns) if i not in [list_of_columns_to_exclude]]]

or

df.loc[:,[i for i in list(df.columns) if i not in [list_of_columns_to_exclude]]]
user1718097
  • 4,090
  • 11
  • 48
  • 63
2

I've tested speed and found that for me the .loc solution was the fastest

df_working_1.loc[:, df_working_1.columns != "market_id"] 
# 7.19 ms ± 201 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
df_working_1.drop("market_id", axis=1)
# 7.65 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
df_working_1[df_working_1.columns.difference(['market_id'])]
# 7.58 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
df_working_1[[i for i in list(df_working_1.columns) if i != 'market_id']]
# 7.57 ms ± 144 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
dimay
  • 2,768
  • 1
  • 13
  • 22
1

I think a nice solution is with the function filter of pandas and regex (match everything except "b"):

df.filter(regex="^(?!b$)")
Billy Bonaros
  • 1,671
  • 11
  • 18
1

Similar to @Toms answer, it is also possible to select all columns except "b" without using .loc, like so:

df[df.columns[~df.columns.isin(['b'])]]
DataBach
  • 1,330
  • 2
  • 16
  • 31
0

You can also pop() a column. It removes a column from a dataframe but returns it as a Series, which you assign to a value (y below). If you don't assign, it's just thrown away. One case where this is quite useful is to separate the target variable from the feature set in ML. For example:

X = pd.DataFrame({'feature1': range(5), 'feature2': range(6,11), 'target': [0,0,0,1,1]})
y = X.pop('target')

It makes the following transformation:

res

cottontail
  • 10,268
  • 18
  • 50
  • 51
0

This allows you to drop multiple columns even if you aren't sure they exist, and works for MultiIndex columns too.

df.drop(columns=[x for x in ('abc', ('foo', 'bar')) if x in df.columns])

In this example (assuming a 2-level MultiIndex) it will drop all columns with abc in the first level, and it will also drop the single column ('foo', 'bar')

I've added this answer as this is the first question that appears even when searching for MultiIndex.

fantabolous
  • 21,470
  • 7
  • 54
  • 51