0

I have a dataframe column of strings(a sentence) which is read from the excel file and I have a list of strings(keywords)

df['Purpose'] = ['Central Team Offiste Material - R2 Strengths Profiler test x 7', 'Project Green conference', 'had to book flight as late for flight due to transportation', 'Dublin Transition', 'Training - Dublin transition', 'HRLT Offsite in Dublin - seat choice', 'Baggage fare plus upgrade in flight class', 'Due to a family emergency Jeremy needed to fly home earlier', 'flight back to london after various clients meeting', 'Travel to UK']

and

Rule2_list=['Dublin', 'stakeholders', 'Travel', 'interviews', 'workshop', 'due-diligence', 'business trip', 'client', 'risk']

I want to check if any of the list element in Rule2_list is present in df['Purpose']

How do i achieve that? Any help is much appreciated

Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
  • 1
    This is about combination functionality of `.str.contains` and `.isin`; Using `isin` only doesn't solve the problem (not fully duplicate); solution would be `df['Purpose'].str.contains('|'.join(Rule2_list)).any()` – bubble Apr 10 '19 at 05:02

1 Answers1

1

you can use https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html#pandas.Series.str.contains

 import pandas as pd
 df = pd.DataFrame()

 df['Purpose'] = ['Central Team Offiste Material - R2 Strengths Profiler test x 7', 
   'Project Green conference', 'had to book flight as late for flight due to transportation', 'Dublin Transition', 'Training - Dublin transition', 
   'HRLT Offsite in Dublin - seat choice', 'Baggage fare plus upgrade in flight class', 'Due to a family emergency Jeremy needed to fly home earlier', 
   'flight back to london after various clients meeting', 'Travel to UK']

 Rule2_list=['Dublin', 'stakeholders', 'Travel', 'interviews', 'workshop', 'due-diligence', 'business trip', 'client', 'risk']

 df['Purpose'].str.contains(Rule2_list[0])

 0    False
 1    False
 2    False
 3     True
 4     True
 5     True
 6    False
 7    False
 8    False
 9    False
 Name: Purpose, dtype: bool
Jayendra Parmar
  • 702
  • 12
  • 30