Using python as my scripting language, this is the regex in question: [\(\[](\w+ ?)+[\)\]]
which basically would match anything within a set of parenthesis or brackets, e.g., (this would match)
; [this would match]
; (this) and (this)
would also match.
The expression works fine when working with one-off string matches; however, when I utilize it as a pattern in a broader text processing pipeline, it tremendously slows down the process. If I remove that one pattern, a dataframe of 77k+ rows processes almost instantly. With the above pattern, it is estimated to be taking about 2 hours.
What's going on here? I've tried removing the brackets and just looking for parens, which seems to have sped things up a tad, but this just doesn't make any intuitive sense.
NOTE:
this similar expression [\(\[].+[\)\]]
works as fast as expected, but is too aggressive in what it would remove. The above example of (this) and (this)
would remove everything between the first and last bracket, resulting in an empty string.
EDIT: A detailed explanation was shared at this duplicate question (Fixing Catastrophic Backtracking in Regular Expression), however, the responders below helped address the specifics of my question.