I have a dataset that I'm doing a string search within. I have a list of strings to search for, and if any of them appear, the flag should be set with 'Y'.
For example...
#Import key libraries
import pandas as pd
import numpy as np
data = {'Strings': ['Profit Sharing', 'Defined Benefit', 'Defined Contribution', '401(K)']}
df=pd.DataFrame (data, columns=['Strings'])
df['Flag']=np.nan
StringList=['MONEY PURCHASE', 'MPP', 'DEFINED CONTRIBUTION', 'DEFINED CONT', 'SELF', 'KEOGH', 'KEOUGH', 'PROFIT', 'PSP', 'P-S PLAN', 'PS PL', 'SAVINGS', 'AGE-WEIGHTED', 'AGE WEIGHTED', 'NEW COMPARABILITY', 'THRIFT', 'STOCK BONUS', '401K', 'K401', '401(K)', '401 (K)', '4401-PW', '401PW', '401-K', '408K', '408 K', 'K408', '408(K)', '408 (K)', '408-K']
StringPattern="|".join(StringList)
df['Flag']=df['Strings'].str.contains(StringPattern, case=False)
print(df)
Any string with '401' is not going to be picked up. It looks like it's not being treated as a string. How do I fix that?