1

This regex query run fine when i not insert any sign that not in [,.] before the word 'here' :

RegEx.Replace("My products or something / else here ", "My ((?:[a-z']* ??)*?)\s*([,.]|$| here)", "")

But it will be very very slow (freeze about 3-5 second or more) if i insert a sign that not in [,.] before the word 'here'. For example i insert the sign '/' before the word 'here' :

RegEx.Replace("My products or something / else here ", "My ((?:[a-z']* ??)*?)\s*([,.]|$| here)", "")

The problem gone when i add / to my pattern [,.] :

RegEx.Replace("My products or something / else here ", "My ((?:[a-z']* ??)*?)\s*([/,.]|$| here)", "")

But i want my regex ignore the sign / instead of matching the sign / as the end of my sentence. Why this problem come and how to resolve it ?

monocular
  • 315
  • 1
  • 6
  • 18

1 Answers1

4

You are a victim of catastrophic backtracking. This part:

(?:[a-z']* ??)*?

can match the words in an exponential amount of possible combinations. Since the space is optional, the word else alone can be matched in all of these variations (where the parentheses indicate what is matched by one "instance" of the inner group):

(else)
(els)(e)
(el)(se)
(el)(s)(e)
(e)(lse)
(e)(l)(se)
(e)(ls)(e)
(e)(l)(s)(e)

And this explodes for longer words, and especially an entire sentence. Generally the problem occurs whenever you have nested repetition, and it is not clear where one repetition ends and the other begins. Then, if there is no match, the engine needs to backtrack through all of these cases before it can declare failure. If there is a match, the backtracking is usually unnecessary, and the problem goes unnoticed. The best fix is to use an "unrolling-the-loop" pattern, to make the space mandatory in the repetition:

"My ([a-z']*(?: [a-z']*)*?)\s*([,.]|$| here)"

Now that the space is mandatory, each "instance" of the repeated has to match an entire word, which should resolve the problem.

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130