1

I have a dataframe with columns that follow certain naming convention. I want to keep only those that have 'out' and 'agg' as prefixes in the header.

I've drafted the following code to achieve this. I created a list so that I can make this a small function and call it for any combination of col prefixes that I want to extract.

prefix = ['out', 'agg']
cols = []
for pref in prefix:
    cols = cols + [col for col in df.columns if pref in col]
df = df[cols].dropna(how='all', axis=0)

Is there a shorter/faster way to do this? I liked the solutions here:Drop columns whose name contains a specific string from pandas DataFrame but couldn't make them work for a list of strings.

thanks

SModi
  • 125
  • 14

1 Answers1

2

Use DataFrame.filter with regex for match columns names by strings joined by | for regex or:

df = pd.DataFrame({
        'A_out':list('abcdef'),
         'B_out':[4,5,4,5,5,4],
         'C_agg':[7,8,9,4,2,3],
         'agg_D':[1,3,5,7,1,0],
         'out_E':[5,3,6,9,2,4],
         'F_agg':list('aaabbb')
})

prefix = ['out', 'agg']

If need match values for any positions in columns names:

df0 = df.filter(regex='|'.join(prefix)).dropna(how='all')
print (df0)
  A_out  B_out  C_agg  agg_D  out_E F_agg
0     a      4      7      1      5     a
1     b      5      8      3      3     a
2     c      4      9      5      6     a
3     d      5      4      7      9     b
4     e      5      2      1      2     b
5     f      4      3      0      4     b

If need only suffixes add $ for match end of strings:

df1 = df.filter(regex='|'.join(f'{x}$' for x in prefix)).dropna(how='all')
print (df1)
  A_out  B_out  C_agg F_agg
0     a      4      7     a
1     b      5      8     a
2     c      4      9     a
3     d      5      4     b
4     e      5      2     b
5     f      4      3     b

If need only prefixes add ^ for match start of strings:

df2 = df.filter(regex='|'.join(f'^{x}' for x in prefix)).dropna(how='all')
print (df2)
   agg_D  out_E
0      1      5
1      3      3
2      5      6
3      7      9
4      1      2
5      0      4
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 2
    The only modification I would make would be enforcing the prefixes/suffixes to be actual prefixes/sufixes (start/end of the string) in the regex. – Adirio Sep 28 '20 at 13:13
  • Thanks this works, I went for the third option since they're prefixes. If I understand the code correctly: regex='|'.join(prefix) will do this: regex = out | agg. Correct? What does ^ do? Why can't I use ${x} instead? – SModi Sep 28 '20 at 13:28
  • 1
    @SModi - Because in regex `$` is special char match end of string - so need add it to end - string + special char for end of string – jezrael Sep 28 '20 at 13:35