17

The Rust Regex crate offers the regex! syntax extension which makes it possible to compile a regex during the standard compile time. This is good in two ways:

  • we don't need to do that work during runtime (better program performance)
  • if our regex is malformed, the compiler can tell us during compilation instead of triggering a runtime panic

Unfortunately, the docs say:

WARNING: The regex! compiler plugin is orders of magnitude slower than the normal Regex::new(...) usage. You should not use the compiler plugin unless you have a very special reason for doing so.

This sounds like a completely different regex engine is used for regex! than for Regex::new(). Why isn't regex!() just a wrapper for Regex::new() to combine the advantages from both worlds? As I understand it, these syntax-extension compiler plugins can execute arbitrary code; why not Regex::new()?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Lukas Kalbertodt
  • 79,749
  • 26
  • 255
  • 305

1 Answers1

19

The answer is very subtle: one feature of the macro is that the result of regex! can be put into static data, like so:

static r: Regex = regex!("t?rust");

The main problem is that Regex::new() uses heap allocations during the regex compilation. This is problematic and would require a rewrite of the Regex::new() engine to also allow for static storage. You can also read burntsushi's comment about this issue on reddit.


There are some suggestions about how to improve regex!:

  • Drop the static support and just validate the regex string at compile time while still compiling the regex at runtime
  • Keep the static support by using a similar trick as lazy_static! does

As of the beginning of 2017, the developers are focused on stabilizing the standard API to release version 1.0. Since regex! requires a nightly compiler anyway, it has a low priority right now.

However, the compiler-plugin approach could offer even better performance than Regex::new(), which is already super fast: since the regex's DFA could be compiled into code instead of data, it has the potential to run a bit faster and benefit from compiler optimizations. But more research has to be done in the future to know for sure.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Lukas Kalbertodt
  • 79,749
  • 26
  • 255
  • 305
  • 2
    What I personally would like to see is a `const fn` story sufficiently advanced to allow allocating memory, such that a Regex could be compiled at compile-time without any trick... may never see the light of day and would not help much for the Lazy DFA approach. – Matthieu M. Jan 06 '17 at 11:40
  • 5
    I think "potential to run a lot faster" is overstating things. Otherwise, good answer. :-) – BurntSushi5 Jan 06 '17 at 12:16
  • 5
    @BurntSushi5 psssh, what do you even know about it anyway? (For others: BurntSushi5 is the primary author of the regex crate ^_^) – Shepmaster Jan 06 '17 at 13:22
  • 2
    @BurntSushi5 I slightly edited the sentence ;-) I guess I just really like the idea that regexes could compile down to the exact code, I would have written by hand (or probably even better code!). – Lukas Kalbertodt Jan 06 '17 at 13:26