130

I have some data and when I import it, I get the following unneeded columns. I'm looking for an easy way to delete all of these.

'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27',
'Unnamed: 28', 'Unnamed: 29', 'Unnamed: 30', 'Unnamed: 31',
'Unnamed: 32', 'Unnamed: 33', 'Unnamed: 34', 'Unnamed: 35',
'Unnamed: 36', 'Unnamed: 37', 'Unnamed: 38', 'Unnamed: 39',
'Unnamed: 40', 'Unnamed: 41', 'Unnamed: 42', 'Unnamed: 43',
'Unnamed: 44', 'Unnamed: 45', 'Unnamed: 46', 'Unnamed: 47',
'Unnamed: 48', 'Unnamed: 49', 'Unnamed: 50', 'Unnamed: 51',
'Unnamed: 52', 'Unnamed: 53', 'Unnamed: 54', 'Unnamed: 55',
'Unnamed: 56', 'Unnamed: 57', 'Unnamed: 58', 'Unnamed: 59',
'Unnamed: 60'

They are indexed by 0-indexing so I tried something like

df.drop(df.columns[[22, 23, 24, 25, 
26, 27, 28, 29, 30, 31, 32 ,55]], axis=1, inplace=True)

But this isn't very efficient. I tried writing some for loops but this struck me as bad Pandas behaviour. Hence i ask the question here.

I've seen some examples which are similar (Drop multiple columns in pandas) but this doesn't answer my question.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Peadar Coyle
  • 2,203
  • 3
  • 16
  • 20
  • 2
    What do you mean, efficient? Is it running too slow? If your problem is that you don't want to get the indices of all the columns that you want to delete, please note that you can just give `df.drop` a list of column names: `df.drop(['Unnamed: 24', 'Unnamed: 25', ...], axis=1)` – Carsten Feb 16 '15 at 09:53
  • Would it not be easier to just subset the columns of interest: i.e. `df = df[cols_of_interest]`, otherwise you could slice the df by columns and get the columns `df.drop(df.ix[:,'Unnamed: 24':'Unnamed: 60'].head(0).columns, axis=1)` – EdChum Feb 16 '15 at 09:56
  • 2
    I meant inefficient in terms of typing or 'bad code smell' – Peadar Coyle Feb 17 '15 at 11:03
  • 1
    Might be worth noting that in most cases it's easier just to keep the columns you want then delete the ones that you don't: df = df['col_list'] – sparrow Apr 27 '18 at 22:14

11 Answers11

290

By far the simplest approach is:

yourdf.drop(['columnheading1', 'columnheading2'], axis=1, inplace=True)
Scarabee
  • 5,437
  • 5
  • 29
  • 55
Philipp Schwarz
  • 18,050
  • 5
  • 32
  • 36
  • 1
    I used this format in some of my code and I get a `SettingWithCopyWarning` warning? – KillerSnail Jan 08 '17 at 15:42
  • 3
    @KillerSnail, it is save to ignore. To avoid error, try: df = df.drop(['colheading1', 'colheading2'], axis=1) – Philipp Schwarz Jan 09 '17 at 13:55
  • 7
    The term `axis` explained: https://stackoverflow.com/questions/22149584/what-does-axis-in-pandas-mean. Essentially, `axis=0` is said to be "column-wise" and `axis=1` is "row-wise". – tim-phillips Jun 16 '17 at 18:07
  • 6
    And `inplace=True` means that the `DataFrame` is modified in place. – tim-phillips Jun 16 '17 at 18:07
  • 1
    @Killernail if you don't want the warning, do `yourdf = yourdf.drop(['columnheading1', 'columnheading2'], axis=1)` – happy_sisyphus Oct 12 '17 at 16:32
  • @KillerSnail you have to add `pd.options.mode.chained_assignment = None` after `import pandas as pd` – nick Jul 17 '18 at 19:06
  • this is less performant on very large df as you have to create local copies, this answer works better instead https://stackoverflow.com/a/28540395/5125264 – Matt Jun 24 '20 at 15:31
  • 1
    Also note that it works only with square braces, not parentheses. Like this: `yourdf.drop(['columnheading1', 'columnheading2'])`, not like this `yourdf.drop(('columnheading1', 'columnheading2'))` – acmpo6ou May 13 '21 at 13:52
71

I don't know what you mean by inefficient but if you mean in terms of typing it could be easier to just select the cols of interest and assign back to the df:

df = df[cols_of_interest]

Where cols_of_interest is a list of the columns you care about.

Or you can slice the columns and pass this to drop:

df.drop(df.ix[:,'Unnamed: 24':'Unnamed: 60'].head(0).columns, axis=1)

The call to head just selects 0 rows as we're only interested in the column names rather than data

update

Another method: It would be simpler to use the boolean mask from str.contains and invert it to mask the columns:

In [2]:
df = pd.DataFrame(columns=['a','Unnamed: 1', 'Unnamed: 1','foo'])
df

Out[2]:
Empty DataFrame
Columns: [a, Unnamed: 1, Unnamed: 1, foo]
Index: []

In [4]:
~df.columns.str.contains('Unnamed:')

Out[4]:
array([ True, False, False,  True], dtype=bool)

In [5]:
df[df.columns[~df.columns.str.contains('Unnamed:')]]

Out[5]:
Empty DataFrame
Columns: [a, foo]
Index: []
borgr
  • 20,175
  • 6
  • 25
  • 35
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • I get errors when I try doing either ~df.columns... (TypeError: bad operand type for unary ~: 'str') or df.columns.str.contains... (AttributeError: 'Index' object has no attribute 'str'). Any ideas why this might be? – Dai Jun 03 '17 at 08:37
  • @EdChum can I create __df = df[cols_of_interest]__, where __cols_of_interest__ adds a column name to it everytime a for loop iterates ? –  Feb 22 '18 at 09:52
  • @Victor no if you do that you overwrite your `df` with your new column you should `append` perhaps but I don't really understand your question, you should post a real question on SO rather than ask as a comment as it's poor form on SO – EdChum Feb 22 '18 at 09:54
  • @EdChum you're absolutely right. I have created the question and I am trying to solve it by searching different parts of SO. Here is the link ! any contribution will help https://stackoverflow.com/questions/48923915/create-a-for-loop-that-drops-columns-depending-on-the-feature-contribution –  Feb 22 '18 at 09:55
57

My personal favorite, and easier than the answers I have seen here (for multiple columns):

df.drop(df.columns[22:56], axis=1, inplace=True)
sheldonzy
  • 5,505
  • 9
  • 48
  • 86
22

This is probably a good way to do what you want. It will delete all columns that contain 'Unnamed' in their header.

for col in df.columns:
    if 'Unnamed' in col:
        del df[col]
knightofni
  • 1,906
  • 3
  • 17
  • 22
  • this `for col in df.columns:` can be simplified to `for col in df:`, also the OP has not indicated what the naming scheme is for the other columns, they could all contain 'Unnamed', also this is inefficient as it removes the columns one at a time – EdChum Feb 16 '15 at 11:35
  • It's certainly not efficient, but as long as we're not working on huge dataframes it won't have a significant impact. The plus point of this method is that it's simple to remember and fast to code - while creating a list of the columns you want to keep can be pretty painful. – knightofni Feb 16 '15 at 11:45
  • I think this is likely to be most performant on large df because you don't have to make a local copy with `inplace = True` – Matt Jun 24 '20 at 15:31
17

You can do this in one line and one go:

df.drop([col for col in df.columns if "Unnamed" in col], axis=1, inplace=True)

This involves less moving around/copying of the object than the solutions above.

Peter
  • 284
  • 2
  • 9
13

Not sure if this solution has been mentioned anywhere yet but one way to do is is pandas.Index.difference.

>>> df = pd.DataFrame(columns=['A','B','C','D'])
>>> df
Empty DataFrame
Columns: [A, B, C, D]
Index: []
>>> to_remove = ['A','C']
>>> df = df[df.columns.difference(to_remove)]
>>> df
Empty DataFrame
Columns: [B, D]
Index: []
px06
  • 2,256
  • 1
  • 27
  • 47
8

You can just pass the column names as a list with specifying the axis as 0 or 1

  • axis=1: Along the Rows
  • axis=0: Along the Columns
  • By default axis=0

    data.drop(["Colname1","Colname2","Colname3","Colname4"],axis=1)

Swaroop Maddu
  • 4,289
  • 2
  • 26
  • 38
7

Simple and Easy. Remove all columns after the 22th.

df.drop(columns=df.columns[22:]) # love it
Niedson
  • 71
  • 1
  • 3
  • To modify `df` in place, add the flag `inplace=True`, So that `df.drop(columns=df.columns[22:], inplace=True)` – arilwan Sep 16 '20 at 16:07
1

The below worked for me:

for col in df:
    if 'Unnamed' in col:
        #del df[col]
        print col
        try:
            df.drop(col, axis=1, inplace=True)
        except Exception:
            pass
Shivgan
  • 11
  • 1
0

df = df[[col for col in df.columns if not ('Unnamed' in col)]]

ElSheikh
  • 321
  • 6
  • 28
Sarah
  • 1,854
  • 17
  • 18
  • 1
    This is similar to Peter's except that undesired columns are filtered out instead of dropped. – Sarah Feb 19 '19 at 15:56
0

You can drop all columns that start with 'Unnamed':

df.loc[:, ~df.columns.str.startswith('Unnamed')]
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73