0

This is related but diff from another post

data = {'spike-2': [1,2,3], 'hey spke': [4,5,6], 'spiked-in': [7,8,9], 'no': [10,11,12]}
df = pd.DataFrame(data)

I want to choose columns by names if column name matches MORE than one substring criteria.

I tried to use AND operator ie &

spike_cols = [col for col in df.columns if ('spike') & ('hey') in col]

so that I can precisely get the one column 'hey spike' I also used

dfnew = df.filter(regex='spike'&'hey')

getting error

TypeError: unsupported operand type(s) for &: 'str' and 'str'

Community
  • 1
  • 1
inigam
  • 3
  • 2
  • Sorry are you after: `df.loc[:,df.columns.str.contains('spike|hey')]`? or in fact: `df.filter(regex='spike|hey')`? – EdChum Apr 20 '17 at 15:28
  • @EdChum wasnt quite looking for an OR there were many solutions for OR. was looking for an AND condition. – inigam Apr 20 '17 at 16:08

1 Answers1

1

Here is a method without regex, just use in to check substring criteria:

df[[col for col in df.columns if 'hey' in col and 'spike' in col]]

enter image description here

Or if you want to use regex, you can do:

df.filter(regex='(?=.*hey)(?=.*spike)')

See this answer.

enter image description here

Community
  • 1
  • 1
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • 1
    thank you, my first stack question. I also used your answer to modify it to use variables within regex. like df.filter(regex='(?=.*'+a+')(?=.*'+b+')') – inigam Apr 20 '17 at 16:05