Efficient way to search string contains in multiple columns using pandas

Question

I have a pandas dataframe like as shown below

import pandas as pd
import numpy as np
df=pd.DataFrame({'Adm DateTime':['02/25/2012','03/05/1996','11/12/2010','31/05/2012','21/07/2019','31/10/2020'],
                 's_id':[1,2,3,4,5,6],
                'test_string_1':['test','Thalaivar','Superstar','God','Favorite','Rajinikanth'],
                'test_string_2':['Rajinikanth','God of Cinema','Favorite','Superstar','Rahman','ARR']})
df['Adm DateTime'] = pd.to_datetime(df['Adm DateTime'])

I would like to check whether a substring is present in any of the columns (test_string_1 and test_string_2)

Though I am able to do for one column like as shown below

df['op_flag'] = np.where(df['test_string_1'].str.contains('Rajini|God|Thalaivar',case=False),1, 0)

Can you help me with how can we do this across both the columns?

Should I repeat the above code with a different column name?

Is there any way to provide the column names that I would like to check for in the code?

score 2 · Accepted Answer · answered Jan 03 '21 at 10:00

2

You can do this with a lambda function

In [40]: df[['test_string_1', 'test_string_2']].apply(lambda x: x.str.contains('Rajini|God|Thalaivar',case=False)).any(axis=1).astype(int)
Out[40]:
0    1
1    1
2    0
3    1
4    0
5    1
dtype: int64

answered Jan 03 '21 at 10:00

Asish M.

2,588
1
16
31

Efficient way to search string contains in multiple columns using pandas

1 Answers1