Concise description
I am working on a project where I have a list of keywords(has special character in it) and I have a string, I have to check whether any of the keywords are present in that string and extract the same. It is going to be a case insensitive search. But the exact keyword has to be present. If SAP
is a keyword then sap
is a positive hit while saphire
is a negative hit.
I have put in a lot of efforts, but I could only achieve output which is partially what I am looking for.
This is a sample code for you to understand :
>>> keywords = ["HIPAA", "ERP(2.0)"]
>>> r = re.compile('|'.join([r'\b%s\b' % w for w in keywords]), flags=re.I)
>>> word = "HIPAAA and ERP(2.0)"
>>> r.findall(word)
['']
here I should be getting this output - ["ERP(2.0)"]
I have checked out this question : Escape regex special characters in a Python string but this doesnt really answer my question.
can anyone please guide me how to make this work, considering I have 10's of keywords which has special character in it, and I am importing those keywords from MySQL?
Detailed description
Test 1
>>> keywords = ["HIPAA", "ERP"]
>>> r = re.compile('|'.join([r'\b%s\b' % w for w in keywords]), flags=re.I)
>>> word = "HIPAA and ERP"
>>> r.findall(word)
['HIPAA', 'ERP']
Test 2
>>> keywords = ["HIPAA", "ERP(2.0)"]
>>> r = re.compile('|'.join([r'\b%s\b' % w for w in keywords]), flags=re.I)
>>> word = "HIPAA and ERP(2.0)"
>>> r.findall(word)
['']
Test 3
>>> keywords = ["HIPAA", "ERP\(2.0\)"]
>>> r = re.compile('|'.join([r'\b%s\b' % w for w in keywords]), flags=re.I)
>>> word = "HIPAA and ERP(2.0)"
>>> r.findall(word)
['HIPAA']
Test 4
>>> keywords = ["HIPAA", "ERP(2.0)"]
>>> r = re.compile('|'.join([r'\b%s\b' % re.escape(w) for w in keywords]), flags=re.I)
>>> word = r"HIPAASTOL and ERP(2.0)"
>>> r.findall(word)
[]
Test 5
>>> keywords = ["HIPAA", "ERP(2.0)"]
>>> r = re.compile('|'.join([re.escape(w) for w in keywords]), flags=re.I)
>>> word = r"HIPAASTOL and ERP(2.0)"
>>> r.findall(word)
['HIPAA', 'ERP(2.0)']
Thanks in advance :)