what is the fast way to drop columns in pandas dataframe from a list of column names

Question

I'm trying to figure out the fastest way to drop columns in df using a list of column names. this is a fancy feature reduction technique. This is what I am using now, and it is taking forever. Any suggestions are highly appreciated.

    important2=(important[:-(len(important)-500)]) 
    for i in important:
        if i in important2:
            pass
        else:
            df_reduced.drop(i, axis=1, inplace=True)
    df_reduced.head()

Using del df[col] was hundreds of times faster for me than using df.drop(col). — DavidSilverberg, Sep 02 '22 at 16:10

ℕʘʘḆḽḘ · Accepted Answer · 2018-06-29T15:05:21.363

18

use a list containing the columns to be dropped:

good_bye_list = ['column_1', 'column_2', 'column_3']
df_reduced.drop(good_bye_list, axis=1, inplace=True)

edited Jun 29 '18 at 15:05

answered Nov 15 '16 at 02:51

ℕʘʘḆḽḘ

18,566
34
128
235

7

This is definitely the "best" way to do it; however, any idea why it would take a long time to run. I have a large dataframe (2 million observations, 98 columns) but still...this should be very fast? Unless I'm missing something. It took me 1min+ to delete two columns. – Lucas H May 09 '19 at 19:55
2

why use a list when .drop provides this functionality? `df_reduced.drop(columns=['column_1', 'column_2', 'column_3'], inplace=True)` that's more pythonic/readable anyway – Marc Maxmeister Dec 19 '19 at 18:16

what is the fast way to drop columns in pandas dataframe from a list of column names

1 Answers1