3
  • How can I check whether a Ruby 2.0 onigmo (or 1.9 oniguruma) regex includes inefficiency such as catastrophic backtracking?

  • How can I follow the internal steps that a Ruby 2.0 or (1.9) regex match is attempted against a string?

sawa
  • 165,429
  • 45
  • 277
  • 381
  • 2
    What is up with the close votes for the question "not being about programming"??? How could a question be *more* closely related to programming than this? – Alex D Jun 30 '13 at 17:01
  • I'm guessing that you are allowing users of your app to do regex searches on data, and want to prevent DOS attacks from malicious users. If that is the case, an alternate solution would be to run each regex search in a separate thread, and kill threads which take too long to finish. – Alex D Jun 30 '13 at 17:05
  • @AlexD Thanks for the comment, but the regexes are actually written by me. It is used to analyze a DSL, and it is very complicated. For example, it includes recursive reference to match balanced parentheses. Shamefully, I cannot fully check my own regexes whether they are effective or not. – sawa Jun 30 '13 at 17:07
  • 1
    @sawa: insofar as I was reading earlier today, the gist of avoiding catastrophic backtracking is to avoid `(expression*)*` and variations thereof. If you that sort of thing, you probably have backtracking problems -- any possibility to match a set of characters with two cases or more in your regex is potentially problematic, because it'll end up being explored. (Good question, btw. I wish this kind of stuff wasn't deemed off topic on SO...) – Denis de Bernardy Jun 30 '13 at 19:34
  • 1
    @AlexD, maybe because the question uses the phrase `Is there a tool`. It could be changed to `How can I` to become more `"on-topic"`. Anyways, +1 on the question, I've voted for reopen. – Dogbert Jun 30 '13 at 19:58
  • Thanks for all the comments. I will change the question a bit. – sawa Jul 01 '13 at 01:50

1 Answers1

3

Regexbuddy, according to this page, allows to detect them:

Regexbuddy is forgiving in that it detects it's going in circles, and aborts the match attempt. Other regex engines (like .NET) will keep going forever, while others will crash with a stack overflow (like Perl, before version 5.10).

http://www.regular-expressions.info/catastrophic.html

(Not sure how up to date the info is)

Also found this related question:

How do you debug a regex?

In the case of ruby, there also seems to be additional compile flags that you can enable:

recompiled Ruby 2.0 (ruby-head) from source after setting a couple of special compiler flags: ONIG_DEBUG_COMPILE and ONIG_DEBUG_MATCH

http://patshaughnessy.net/2012/4/3/exploring-rubys-regular-expression-algorithm

Community
  • 1
  • 1
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
  • 2
    Note that regexbuddy doesn't support possessive quantifiers in ruby mode. – Casimir et Hippolyte Jun 30 '13 at 11:57
  • 1
    RegexBuddy just tries to perform a match on a whatever string you feed it, and if it hasn't gotten a result after an arbitrary number of operations, it quits. Then you can switch to Debug mode and see exactly what it was doing. It works great, but I think he wants a tool that can analyze the regex alone and warn you if it *could* result in never-ending matches. – Alan Moore Jul 01 '13 at 04:30