3

I've a regular expression that should validate if a string is composed by space-delimited strings. The regular expression works well (ok it allows a empty space in the end ... but that's not he problem) but takes too long when the validation fails.

The regular expression is the following:

/^(([\w\-]+)( )?){0,}$/

When trying to validate with the string

"'this-is_SAMPLE-scope-123,this-is_SAMPLE-scope-456'"

it takes 2 seconds.

The tests were performed in ruby 1.9.2-rc1 and 1.8.7. But this is probably a general problem.

Any idea?

Amarghosh
  • 58,710
  • 11
  • 92
  • 121
  • FWIW, it seems OK here on 1.9.1, I've just tried `/^(([\w-]+)( )?){0,}$/.match "'this-is_SAMPLE-scope-123,this-is_SAMPLE-scope-456'"` and it returns `nil` instantly. – mikej Jul 09 '10 at 11:52
  • with ruby 1.9.1 i'm getting the following results with Benchmark module: user system total real 2.020000 0.010000 2.030000 ( 2.024168) – miguelfteixeira Jul 09 '10 at 12:01
  • @miguel: give examples of what you're trying to match, and what you don't want to match. The more examples the better; make sure they cover all of your cases. – polygenelubricants Jul 09 '10 at 12:03

1 Answers1

9

Your pattern causes catastrophic backtracking. The catastrophic part can be summarized to this:

(.+)*

The + and the * interacts in catastrophic ways in some engines.

It's unclear what you're trying to match, exactly, but it may be something like this:

^[\w\-]+( [\w\-]+)*$

This matches (as seen on rubular.com):

hello world
99 bottles of beer on the wall
this_works_too

and rejects:

not like this, not like this
hey what the &#@!
too many    spaces

Another option would be to use possessive quantifiers and/or atomic groupings in parts of the original pattern.

References


Additional tips

The {0,} repetition is usually written simply as *. You can also use non-capturing groups to improve performance, i.e. (?:pattern).

References

Related questions

Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • Thanks, with your solution the benchmark shows the following results: 0.000000 0.000000 0.000000 ( 0.000021) Also thanks for the tips. – miguelfteixeira Jul 09 '10 at 12:09
  • Here's a demonstration of the catastrophic backtracking in the original pattern: `^(x+)*$` on input string `xxxxxxxxxxx!` -- http://www.rubular.com/r/swdpfK3HOY -- add more `x` and eventually rubular will time-out. – polygenelubricants Jul 09 '10 at 12:33