Deleting multiple columns based on column names

Question

I have some data and when I import it, I get the following unneeded columns. I'm looking for an easy way to delete all of these.

'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27',
'Unnamed: 28', 'Unnamed: 29', 'Unnamed: 30', 'Unnamed: 31',
'Unnamed: 32', 'Unnamed: 33', 'Unnamed: 34', 'Unnamed: 35',
'Unnamed: 36', 'Unnamed: 37', 'Unnamed: 38', 'Unnamed: 39',
'Unnamed: 40', 'Unnamed: 41', 'Unnamed: 42', 'Unnamed: 43',
'Unnamed: 44', 'Unnamed: 45', 'Unnamed: 46', 'Unnamed: 47',
'Unnamed: 48', 'Unnamed: 49', 'Unnamed: 50', 'Unnamed: 51',
'Unnamed: 52', 'Unnamed: 53', 'Unnamed: 54', 'Unnamed: 55',
'Unnamed: 56', 'Unnamed: 57', 'Unnamed: 58', 'Unnamed: 59',
'Unnamed: 60'

They are indexed by 0-indexing so I tried something like

df.drop(df.columns[[22, 23, 24, 25, 
26, 27, 28, 29, 30, 31, 32 ,55]], axis=1, inplace=True)

But this isn't very efficient. I tried writing some for loops but this struck me as bad Pandas behaviour. Hence i ask the question here.

I've seen some examples which are similar (Drop multiple columns in pandas) but this doesn't answer my question.

What do you mean, efficient? Is it running too slow? If your problem is that you don't want to get the indices of all the columns that you want to delete, please note that you can just give `df.drop` a list of column names: `df.drop(['Unnamed: 24', 'Unnamed: 25', ...], axis=1)` — Carsten, Feb 16 '15 at 09:53
Would it not be easier to just subset the columns of interest: i.e. `df = df[cols_of_interest]`, otherwise you could slice the df by columns and get the columns `df.drop(df.ix[:,'Unnamed: 24':'Unnamed: 60'].head(0).columns, axis=1)` — EdChum, Feb 16 '15 at 09:56
Might be worth noting that in most cases it's easier just to keep the columns you want then delete the ones that you don't: df = df['col_list'] — sparrow, Apr 27 '18 at 22:14

score 290 · Answer 1 · edited May 11 '22 at 00:06

290

By far the simplest approach is:

yourdf.drop(['columnheading1', 'columnheading2'], axis=1, inplace=True)

edited May 11 '22 at 00:06

Scarabee

5,437
5
29
55

answered May 06 '16 at 10:08

Philipp Schwarz

18,050
5
32
36

1

I used this format in some of my code and I get a `SettingWithCopyWarning` warning? – KillerSnail Jan 08 '17 at 15:42
3

@KillerSnail, it is save to ignore. To avoid error, try: df = df.drop(['colheading1', 'colheading2'], axis=1) – Philipp Schwarz Jan 09 '17 at 13:55
7

The term `axis` explained: https://stackoverflow.com/questions/22149584/what-does-axis-in-pandas-mean. Essentially, `axis=0` is said to be "column-wise" and `axis=1` is "row-wise". – tim-phillips Jun 16 '17 at 18:07
6

And `inplace=True` means that the `DataFrame` is modified in place. – tim-phillips Jun 16 '17 at 18:07
1

@Killernail if you don't want the warning, do `yourdf = yourdf.drop(['columnheading1', 'columnheading2'], axis=1)` – happy_sisyphus Oct 12 '17 at 16:32
@KillerSnail you have to add `pd.options.mode.chained_assignment = None` after `import pandas as pd` – nick Jul 17 '18 at 19:06
this is less performant on very large df as you have to create local copies, this answer works better instead https://stackoverflow.com/a/28540395/5125264 – Matt Jun 24 '20 at 15:31
1

Also note that it works only with square braces, not parentheses. Like this: `yourdf.drop(['columnheading1', 'columnheading2'])`, not like this `yourdf.drop(('columnheading1', 'columnheading2'))` – acmpo6ou May 13 '21 at 13:52

score 71 · Accepted Answer · edited Aug 05 '20 at 11:04

71

I don't know what you mean by inefficient but if you mean in terms of typing it could be easier to just select the cols of interest and assign back to the df:

df = df[cols_of_interest]

Where cols_of_interest is a list of the columns you care about.

Or you can slice the columns and pass this to drop:

df.drop(df.ix[:,'Unnamed: 24':'Unnamed: 60'].head(0).columns, axis=1)

The call to head just selects 0 rows as we're only interested in the column names rather than data

update

Another method: It would be simpler to use the boolean mask from str.contains and invert it to mask the columns:

In [2]:
df = pd.DataFrame(columns=['a','Unnamed: 1', 'Unnamed: 1','foo'])
df

Out[2]:
Empty DataFrame
Columns: [a, Unnamed: 1, Unnamed: 1, foo]
Index: []

In [4]:
~df.columns.str.contains('Unnamed:')

Out[4]:
array([ True, False, False,  True], dtype=bool)

In [5]:
df[df.columns[~df.columns.str.contains('Unnamed:')]]

Out[5]:
Empty DataFrame
Columns: [a, foo]
Index: []

edited Aug 05 '20 at 11:04

borgr

20,175
6
25
35

answered Feb 16 '15 at 09:58

EdChum

376,765
198
813
562

I get errors when I try doing either ~df.columns... (TypeError: bad operand type for unary ~: 'str') or df.columns.str.contains... (AttributeError: 'Index' object has no attribute 'str'). Any ideas why this might be? – Dai Jun 03 '17 at 08:37
@EdChum can I create __df = df[cols_of_interest]__, where __cols_of_interest__ adds a column name to it everytime a for loop iterates ? – Feb 22 '18 at 09:52
@Victor no if you do that you overwrite your `df` with your new column you should `append` perhaps but I don't really understand your question, you should post a real question on SO rather than ask as a comment as it's poor form on SO – EdChum Feb 22 '18 at 09:54
@EdChum you're absolutely right. I have created the question and I am trying to solve it by searching different parts of SO. Here is the link ! any contribution will help https://stackoverflow.com/questions/48923915/create-a-for-loop-that-drops-columns-depending-on-the-feature-contribution – Feb 22 '18 at 09:55

sheldonzy · Answer 3 · 2021-02-03T17:09:25.207

57

My personal favorite, and easier than the answers I have seen here (for multiple columns):

df.drop(df.columns[22:56], axis=1, inplace=True)

edited Feb 03 '21 at 17:09

answered Oct 01 '17 at 10:41

sheldonzy

5,505
9
48
86

9

This should be the answer. Cleanest, easiest to read, with straightforward native Pandas indexing syntax. – Brent Faust Oct 05 '17 at 21:44
3

This answer should have the green tick next to it, not the others. – Siavosh Mahboubian Aug 20 '19 at 23:41

score 22 · Answer 4 · answered Feb 16 '15 at 11:26

22

This is probably a good way to do what you want. It will delete all columns that contain 'Unnamed' in their header.

for col in df.columns:
    if 'Unnamed' in col:
        del df[col]

answered Feb 16 '15 at 11:26

knightofni

1,906
3
17
22

this `for col in df.columns:` can be simplified to `for col in df:`, also the OP has not indicated what the naming scheme is for the other columns, they could all contain 'Unnamed', also this is inefficient as it removes the columns one at a time – EdChum Feb 16 '15 at 11:35
It's certainly not efficient, but as long as we're not working on huge dataframes it won't have a significant impact. The plus point of this method is that it's simple to remember and fast to code - while creating a list of the columns you want to keep can be pretty painful. – knightofni Feb 16 '15 at 11:45
I think this is likely to be most performant on large df because you don't have to make a local copy with `inplace = True` – Matt Jun 24 '20 at 15:31

score 17 · Answer 5 · answered Sep 27 '16 at 19:36

17

You can do this in one line and one go:

df.drop([col for col in df.columns if "Unnamed" in col], axis=1, inplace=True)

This involves less moving around/copying of the object than the solutions above.

answered Sep 27 '16 at 19:36

Peter

284
2
9

score 13 · Answer 6 · answered Mar 20 '18 at 15:36

Not sure if this solution has been mentioned anywhere yet but one way to do is is pandas.Index.difference.

>>> df = pd.DataFrame(columns=['A','B','C','D'])
>>> df
Empty DataFrame
Columns: [A, B, C, D]
Index: []
>>> to_remove = ['A','C']
>>> df = df[df.columns.difference(to_remove)]
>>> df
Empty DataFrame
Columns: [B, D]
Index: []

score 8 · Answer 7 · answered Sep 23 '19 at 05:36

8

You can just pass the column names as a list with specifying the axis as 0 or 1

axis=1: Along the Rows
axis=0: Along the Columns
By default axis=0

data.drop(["Colname1","Colname2","Colname3","Colname4"],axis=1)

answered Sep 23 '19 at 05:36

Swaroop Maddu

4,289
2
26
38

score 7 · Answer 8 · answered Jan 28 '20 at 15:04

7

Simple and Easy. Remove all columns after the 22th.

df.drop(columns=df.columns[22:]) # love it

answered Jan 28 '20 at 15:04

Niedson

71
1
3

To modify `df` in place, add the flag `inplace=True`, So that `df.drop(columns=df.columns[22:], inplace=True)` – arilwan Sep 16 '20 at 16:07

score 1 · Answer 9 · answered Feb 18 '16 at 14:25

1

The below worked for me:

for col in df:
    if 'Unnamed' in col:
        #del df[col]
        print col
        try:
            df.drop(col, axis=1, inplace=True)
        except Exception:
            pass

answered Feb 18 '16 at 14:25

Shivgan

11
1

score 0 · Answer 10 · edited Mar 22 '19 at 20:43

0

df = df[[col for col in df.columns if not ('Unnamed' in col)]]

edited Mar 22 '19 at 20:43

ElSheikh

321
6
28

answered Feb 19 '19 at 15:54

Sarah

1,854
17
18

1

This is similar to Peter's except that undesired columns are filtered out instead of dropped. – Sarah Feb 19 '19 at 15:56

score 0 · Answer 11 · answered Jul 17 '22 at 11:20

0

You can drop all columns that start with 'Unnamed':

df.loc[:, ~df.columns.str.startswith('Unnamed')]

answered Jul 17 '22 at 11:20

Mykola Zotko

15,583
3
71
73

Deleting multiple columns based on column names

11 Answers11

Linked

Related