Why isn't `regex!` a wrapper for `Regex::new` to offer the same regex matching speed?

Question

The Rust Regex crate offers the regex! syntax extension which makes it possible to compile a regex during the standard compile time. This is good in two ways:

we don't need to do that work during runtime (better program performance)
if our regex is malformed, the compiler can tell us during compilation instead of triggering a runtime panic

Unfortunately, the docs say:

WARNING: The regex! compiler plugin is orders of magnitude slower than the normal Regex::new(...) usage. You should not use the compiler plugin unless you have a very special reason for doing so.

This sounds like a completely different regex engine is used for regex! than for Regex::new(). Why isn't regex!() just a wrapper for Regex::new() to combine the advantages from both worlds? As I understand it, these syntax-extension compiler plugins can execute arbitrary code; why not Regex::new()?

Nice one, it's something that can indeed puzzle newcomers who didn't follow the development of the crate! — Matthieu M., Jan 06 '17 at 11:38

score 19 · Accepted Answer · edited Jan 06 '17 at 13:27

19

The answer is very subtle: one feature of the macro is that the result of regex! can be put into static data, like so:

static r: Regex = regex!("t?rust");

The main problem is that Regex::new() uses heap allocations during the regex compilation. This is problematic and would require a rewrite of the Regex::new() engine to also allow for static storage. You can also read burntsushi's comment about this issue on reddit.

There are some suggestions about how to improve regex!:

Drop the static support and just validate the regex string at compile time while still compiling the regex at runtime
Keep the static support by using a similar trick as lazy_static! does

As of the beginning of 2017, the developers are focused on stabilizing the standard API to release version 1.0. Since regex! requires a nightly compiler anyway, it has a low priority right now.

However, the compiler-plugin approach could offer even better performance than Regex::new(), which is already super fast: since the regex's DFA could be compiled into code instead of data, it has the potential to run a bit faster and benefit from compiler optimizations. But more research has to be done in the future to know for sure.

edited Jan 06 '17 at 13:27

Shepmaster

388,571
95
1,107
1,366

answered Jan 06 '17 at 10:40

Lukas Kalbertodt

79,749
26
255
305

2

What I personally would like to see is a `const fn` story sufficiently advanced to allow allocating memory, such that a Regex could be compiled at compile-time without any trick... may never see the light of day and would not help much for the Lazy DFA approach. – Matthieu M. Jan 06 '17 at 11:40
5

I think "potential to run a lot faster" is overstating things. Otherwise, good answer. :-) – BurntSushi5 Jan 06 '17 at 12:16
5

@BurntSushi5 psssh, what do you even know about it anyway? (For others: BurntSushi5 is the primary author of the regex crate ^_^) – Shepmaster Jan 06 '17 at 13:22
2

@BurntSushi5 I slightly edited the sentence ;-) I guess I just really like the idea that regexes could compile down to the exact code, I would have written by hand (or probably even better code!). – Lukas Kalbertodt Jan 06 '17 at 13:26

Why isn't `regex!` a wrapper for `Regex::new` to offer the same regex matching speed?

1 Answers1