16

I'm trying to figure out the fastest way to drop columns in df using a list of column names. this is a fancy feature reduction technique. This is what I am using now, and it is taking forever. Any suggestions are highly appreciated.

    important2=(important[:-(len(important)-500)]) 
    for i in important:
        if i in important2:
            pass
        else:
            df_reduced.drop(i, axis=1, inplace=True)
    df_reduced.head()
lrn2code
  • 313
  • 1
  • 2
  • 15

1 Answers1

18

use a list containing the columns to be dropped:

good_bye_list = ['column_1', 'column_2', 'column_3']
df_reduced.drop(good_bye_list, axis=1, inplace=True)
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
  • 7
    This is definitely the "best" way to do it; however, any idea why it would take a long time to run. I have a large dataframe (2 million observations, 98 columns) but still...this should be very fast? Unless I'm missing something. It took me 1min+ to delete two columns. – Lucas H May 09 '19 at 19:55
  • 2
    why use a list when .drop provides this functionality? `df_reduced.drop(columns=['column_1', 'column_2', 'column_3'], inplace=True)` that's more pythonic/readable anyway – Marc Maxmeister Dec 19 '19 at 18:16