-1

I have a dataset that I'm doing a string search within. I have a list of strings to search for, and if any of them appear, the flag should be set with 'Y'.

For example...

#Import key libraries
import pandas as pd
import numpy as np

data = {'Strings': ['Profit Sharing', 'Defined Benefit', 'Defined Contribution', '401(K)']}

df=pd.DataFrame (data, columns=['Strings'])

df['Flag']=np.nan

StringList=['MONEY PURCHASE', 'MPP', 'DEFINED CONTRIBUTION', 'DEFINED CONT', 'SELF', 'KEOGH', 'KEOUGH', 'PROFIT', 'PSP', 'P-S PLAN', 'PS PL', 'SAVINGS', 'AGE-WEIGHTED', 'AGE WEIGHTED', 'NEW COMPARABILITY', 'THRIFT', 'STOCK BONUS', '401K', 'K401', '401(K)', '401 (K)', '4401-PW', '401PW', '401-K', '408K', '408 K', 'K408', '408(K)', '408 (K)', '408-K']

StringPattern="|".join(StringList)

df['Flag']=df['Strings'].str.contains(StringPattern, case=False)

print(df)

Any string with '401' is not going to be picked up. It looks like it's not being treated as a string. How do I fix that?

Dhimmel90
  • 39
  • 1
  • 6

1 Answers1

1

You need to escape special regex symbols.

https://docs.python.org/3/library/re.html

Change '401(K)' to '401\(K\)' in string list.