python: How to select dataframe column by names if it matches more than one substring

Question

This is related but diff from another post

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

I want to choose columns by names if column name matches MORE than one substring criteria.

I tried to use AND operator ie &

spike_cols = [col for col in df.columns if ('spike') & ('hey') in col]

so that I can precisely get the one column 'hey spike' I also used

dfnew = df.filter(regex='spike'&'hey')

getting error

TypeError: unsupported operand type(s) for &: 'str' and 'str'

Sorry are you after: `df.loc[:,df.columns.str.contains('spike|hey')]`? or in fact: `df.filter(regex='spike|hey')`? — EdChum, Apr 20 '17 at 15:28
@EdChum wasnt quite looking for an OR there were many solutions for OR. was looking for an AND condition. — inigam, Apr 20 '17 at 16:08

score 1 · Accepted Answer · edited May 23 '17 at 10:31

1

Here is a method without regex, just use in to check substring criteria:

df[[col for col in df.columns if 'hey' in col and 'spike' in col]]

Or if you want to use regex, you can do:

df.filter(regex='(?=.*hey)(?=.*spike)')

edited May 23 '17 at 10:31

Community

answered Apr 20 '17 at 15:30

Psidom

1

thank you, my first stack question. I also used your answer to modify it to use variables within regex. like df.filter(regex='(?=.*'+a+')(?=.*'+b+')') – inigam Apr 20 '17 at 16:05

1 Answers1