I currently have a dataframe with a column that contains some words or chars, im trying to categorize each row by search keywords in that corresponding cell.
example
words | category
-----------------------------------
im a test email | email
here is my handout | handout
here is what i have
conditions = [
(df['words'].str.contains('flyer',False,regex=True)),
(df['words'].str.contains('report',False,regex=True)),
(df['words'].str.contains('form',False,regex=True)),
(df['words'].str.contains('scotia',False,regex=True)),
(df['words'].str.contains('news',False,regex=True)),
(df_prt_copy['words'].str.contains('questions.*\.pdf',False,regex=True)),
.
.
.
.
]
choices = ['open house flyer',
'report',
'form',
'report',
'news',
‘question',
.
.
.
.
]
df['category']=np.select(conditions, choices, default='others')
this works fine, but problem is that i have lots of keywords(probably over 120 or so), so maintaining this keywords list is very difficult, is there any better way to do this ? btw, i'm using python3
note: im looking for a easier method to manage a large list of keywords, which is different from simply a method to find keywords here