I read regexes and their replacements from a CSV into a dictionary and then run that over a column in a Dataframe looking for locations:
for regex, replacement in regex_replace.items():
df["A"] = df["a"].str.replace(regex, replacement)
This works fine and successfully replaces the text. An example regex would be:
(?i)\b(maine)
However, I also want to capture the text that has been replaced from the regex match. I've tried this:
def find_match(regex, x):
j = re.findall(r'{0}'.format(regex), x)
return ",".join(j)
df['matches'] = df['A'].apply(lambda x: find_match(regex,str(x)))
But that doesn't find any matches - I think it's because the backslash is escaped. If I declared the regex variable as a raw string in the code, then this would work:
regex = r'(?i)\b(maine)'
However, I can't do that as it's aready stored in a variable. Is there a way to do this?
Related answers are: regex re.search is not returning the match Python Regex in Variable