I am working with Python and I want to match a given string with multiple substrings. I have tried to solve this problem in two different ways. My first solution was to match the substring with the string like:
str = "This is a test string from which I want to match multiple substrings"
value = ["test", "match", "multiple", "ring"]
temp = []
temp.extend([x.upper() for x in value if x.lower() in str.lower()])
print(temp)
which results in temp = ["TEST", "MATCH", "MULTIPLE", "RING"]
.
However, this is not the result I would like. The substrings should have an exact match, so "ring" should not match with "string".
This is why I tried to solve this problem with regular expressions, like this:
str = "This is a test string from which I want to match multiple substrings"
value = ["test", "match", "multiple", "ring"]
temp = []
temp.extend([
x.upper() for x in value
if regex.search(
r"\b" + regex.escape(x) + r"\b", str, regex.IGNORECASE
) is not None
])
print(temp)
which results in ["TEST", "MATCH", "MULTIPLE"]
, the correct solution.
Be that as it may, this solution takes too long to compute. I have to do this check for roughly 1 million strings and the solution using regex will take days to finish compared to the 1.5 hours it takes using the first solution.
Is there a way to either make the first solution work, or the second solution to run faster?
value
can also contain numbers, or a short phrase like "test1 test2".