4

I want to delete columns that start the particular "TYPE" word and do not contain _1?

df =

TYPE_1    TYPE_2    TYPE_3    COL1
aaa       asb       bbb       123

The result should be:

df =

    TYPE_1    COL1
    aaa       123

Currently I am deleting these columns manually, however this approach is not very efficient if the number of columns is big:

df = df.drop(["TYPE_2","TYPE_3"], axis=1)
ayhan
  • 70,170
  • 20
  • 182
  • 203
Dinosaurius
  • 8,306
  • 19
  • 64
  • 113

4 Answers4

8

A list comprehension can be used. Note: axis=1 denotes that we are referring to the column and inplace=True can also be used as per pandas.DataFrame.drop docs.

droplist = [i for i in df.columns if i.startswith('TYPE') and '_1' not in i]
df1.drop(droplist,axis=1,inplace=True)
nipy
  • 5,138
  • 5
  • 31
  • 72
  • For some reason it does not work for columns that are named as, e.g. `TYPE_MOD`. So, the columns containing `_` are ignored. Why? – Dinosaurius Dec 23 '16 at 20:41
  • I just tested this and my column named 'TYPE_MOD' was dropped. Is there whitespace in the 'TYPE_MOD' string maybe? What do you get when entering `df.columns[df.columns == 'TYPE_MOD']` – nipy Dec 23 '16 at 20:54
6

This is the fifth answer but I wanted to showcase the power of the filter dataframe method which filters by column names with regex. This searches for columns that don't start with TYPE or have _1 somewhere in them.

df.filter(regex='^(?!TYPE)|_1')
Ted Petrou
  • 59,042
  • 19
  • 131
  • 136
  • Can you break down this regex? Thanks. – julieth Dec 24 '16 at 03:39
  • 1
    The regex has two parts separated by an or `|`. First half is `^(?!TYPE)` which is a [negative look-ahead](http://stackoverflow.com/questions/899422/regular-expression-for-a-string-that-does-not-start-with-a-sequence) used to find expressions that do not start with TYPE. The second part is simply looking for any matches with `_1` – Ted Petrou Dec 24 '16 at 04:58
2

Easy:

unwanted = [column for column in df.columns 
            if column.startswith("TYPE") and "_1" not in column]
df = df.drop(unwanted)
DYZ
  • 55,249
  • 10
  • 64
  • 93
1
t_cols = [c for c in df.columns.values if c.startswith('TYPE_') and not c == 'TYPE_1']
df.drop(t_cols)

Should do the job

Steve Barnes
  • 27,618
  • 6
  • 63
  • 73