4

Non-cached:

var sw = Stopwatch.StartNew();
foreach (var str in testStrings)
{
    foreach (var pair in flex)
    {
        if (Regex.IsMatch(str, "^(" + pair.Value + ")$", RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture))
            ;
    }
}
Console.WriteLine("\nRan in {0} ms", sw.ElapsedMilliseconds); // 76 ms

Cached

var cache = flex.ToDictionary(p => p.Key, p => new Regex("^(" + p.Value + ")$", RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture | RegexOptions.Compiled));

var sw = Stopwatch.StartNew();
foreach (var str in testStrings)
{
    foreach (var pair in cache)
    {
        if(pair.Value.IsMatch(str))
            ;
    }
}
Console.WriteLine("\nRan in {0} ms", sw.ElapsedMilliseconds); // 263 ms

I don't know why it's running slower when I pre-compile all the regexes. Not to mention the iterator on flex should be slower too, because it needs to do more calculations.

What could be causing this?


Actually, if I take off the Compiled switch it runs in 8 ms when cached. I thought "compiled" would compile it upon construction of the regex. If not, when does it do so?

Community
  • 1
  • 1
mpen
  • 272,448
  • 266
  • 850
  • 1,236
  • 3
    If you don't put a `for` around this code and run it for 1M iterations or so, any measurements will probably be drowned in the noise. – Jon Mar 19 '11 at 21:56
  • It compiles the regexes to C# code, right? Is the resulting code JITted outside of the loop, or inside? – Joren Mar 19 '11 at 22:00
  • @Jon: Considering I'm getting 263 ms without the loop, I think 1M is pushing it. With 5000 iterations, using `cache` but non-compiled it takes 3282ms. Compiled it takes 3663ms. Still a bit slower, but a smaller magnitude. – mpen Mar 19 '11 at 22:01
  • Just found out it Regexes are cached in-memory after their first use, even when the Compiled option is turned off. – mpen Mar 19 '11 at 22:04
  • This seems quite weird IMHO. Can you try with a single regexp (not iterating through the dict) and see if there is any significant difference? – Can Gencer Mar 19 '11 at 23:28

2 Answers2

1

Regex's are in fact cached not just on first use, but upon construction (taking a look at the 4.0 code in reflector, it may not be precisely so in other frameworks).

As so, the big differences here are:

  1. There's some trivial string concatenation in the latter that isn't in the former, along with the overhead of construction outside of the compilation of the Regex.
  2. There's a different collection being iterated through in the latter than in the former.

It's not clear what sort of collection flex is. If it's not a dictionary, then I wouldn't be at all surprised by this, as dictionaries aren't terrribly fast at enumeration and hence most other collections will beat it.

This aside, it really isn't a case of caching in the latter, since it's caching something that's already going to be retrieved from an in-memory cache, so there's no reason to suspect the latter would be any faster.

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
  • I linked to the flex class in my post. The enumerator actually uses a dict internally and then does quite a bit more work on top of that. Also, I think you got your "former" and "latter" swapped in your first point, which all indicates that the latter should be faster. Anyway, I've concluded that none of those things are significant.... I'll post my answer. – mpen Mar 22 '11 at 02:06
0

The problem is with the RegexOptions.Compiled flag. That actually makes it run a lot slower. Jeff sort of explains this in his blog. Without this flag, the cached version is much faster.

mpen
  • 272,448
  • 266
  • 850
  • 1,236