5

I have a situation where my regular expressions compile extremely slowly on Windows Server 2008. I wrote a small console application to highlight this issue. The app generates its own input and builds up a Regex from words in an XML file. I built a release version of this app and ran it both on my personal laptop (running XP) and the Windows 2008 server. The regular expression took 0.21 seconds to compile on my laptop, but 23 seconds to compile on the server.

Any ideas what could be causing this? The problem is only on first use of the Regex (when it is first compiled - thereafter it is fine)

I have also found another problem - when using \s+ in the regular expression on the same Windows 2008 server, the memory balloons (uses 4GB+) and the compilation of the Regex never finishes.

Is there a known issue with Regex and 64 bit .net? Is there a fix/patch available for this? I cannot really find any info on the net, but I have found a few articles about this same issues in Framework 2.0 - surely this has been fixed by now?

More info: The server is running the 64 bit version of the .net framework (3.5 SP1) and on my laptop I have Visual Studio 2008 and the 3.5 framework installed. The regular expression is of the following pattern: ^word$|^word$|^word$ and is constructed with the following flags: RegexOptions.IgnoreCase | RegexOptions.Compiled


Here is a code snippet:

StringBuilder regexString = new StringBuilder();
if (!String.IsNullOrEmpty(fileLocation))
{
    XmlTextReader textReader = new XmlTextReader(fileLocation);
    textReader.Read();
    while (textReader.Read())
    {
        textReader.MoveToElement();
        if (textReader.Name == "word")
        {
            regexString.Append("^" + textReader.GetAttribute(0) + "$|");
        }
    }
    ProfanityFilter = new Regex(regexString.ToString(0, regexString.Length - 1), RegexOptions.IgnoreCase | RegexOptions.Compiled);
}

DateTime time = DateTime.Now;
Console.WriteLine("\nIsProfane:\n" + ProfanityFilter.IsMatch("test"));
Console.WriteLine("\nTime: " + (DateTime.Now - time).TotalSeconds);
Console.ReadKey();

This results in a time of 0.21 seconds on my laptop and 23 seconds on the 2008 server. The XML file consists of 168 words in the following format:

<word text="test" />
Laurel
  • 5,965
  • 14
  • 31
  • 57
pjmyburg
  • 182
  • 10

3 Answers3

4

You can pre-compile your regexes using the Regex.CompileToAssembly method, and then you could deploy the compiled regexes to your server.

Laurel
  • 5,965
  • 14
  • 31
  • 57
Polyfun
  • 9,479
  • 4
  • 31
  • 39
  • Yes, but that means that the non-technical administrators of the service cannot just add a word to an XML file - the DLLs would need to be recompiled every time. Good suggestion though. – pjmyburg Sep 30 '09 at 15:24
  • 1
    I think he meant that after you read in the file, you use the RegexOptions.Compiled option to optimize the execution of the regex. – brianary Sep 30 '09 at 15:35
  • No, he meant you pre-compile the Regex to a DLL file (assembly) - that is what the CompileToAssembly method does. The RegexOptions.Compiled flag is the cause of this whole issue. That is indeed the way I would want to go, but it seems there is a bug in the 64 bit .net libraries. – pjmyburg Oct 01 '09 at 06:29
4

I found a solution, given not the correct one, but perfect in my case. For some reason if I leave out the RegexOptions.Compiled flag, the Regex is much, much faster. I even managed to execute the Regex on 100 long phrases in under 65 milliseconds on the 2008 server.

This must be a bug in the .net lib as the uncompiled version is supposed to be much slower than the compiled version. Either way, under 1 millisecond per check is very much acceptable for me :)

Bhushan Firake
  • 9,338
  • 5
  • 44
  • 79
pjmyburg
  • 182
  • 10
  • You may also want to experiment with more alternative regex patterns to find the optimal one, such as /^(word|word|word|word)$/ instead of /^word$|^word$|^word$/ . – brianary Sep 30 '09 at 15:38
  • Yes, I am aware of that. Like I mentioned in the original question, I wrote a console application merely to highlight the problem. That exact same Regex compiles in 0.21 seconds on my laptop, so it should not need to compile for 23 seconds on a 64 bit server. – pjmyburg Oct 01 '09 at 06:31
  • Had the same issue and solution, with it set to compiled it ran fine on my local XP box, when uploaded to server was taking 40+ seconds per regex. Removed compiled option and 8 calls now take less than 1 second total. – ManiacZX Aug 04 '10 at 20:03
  • Compiles or runs? Also how do you measure performance? There are 1000 ways how to do that incorrectly meaning showing incorrect results and making incorrect assumptions and decisions. Remember about payload of first run, also target configuration (Debug|Release) and architecture (x86|x64|Any CPU) - these are the very basic moments though. – abatishchev Jun 30 '13 at 05:53
  • The reason the improvement happens is if you have the `Regex.Complied` flag it will compile the regex every time a instance is created, if you leave it out it will compile it once then store it in a cache. The rule of thumb is if you are using the complied flag you must use the object that was created more than once for it to be worth it. [Here is another SO Answer](http://stackoverflow.com/questions/513412/how-does-regexoptions-compiled-work) that has benchmarks and explains it more. – Scott Chamberlain Jun 30 '13 at 06:07
1

I ran into the exact same problem. My app works fine on x86 machines but memory balloons and hangs on x64. Removing the compilation flag did not help. I tried this today on .net 4.0 and the problem remains. If you have a repro, I suggest you file a bug.

I think MSFT knows about this, see the bottom comment here

But let them decide if this is the same bug. Please add a link to your filing here if you file so I can add my comments to it.

abatishchev
  • 98,240
  • 88
  • 296
  • 433
Barka
  • 8,764
  • 15
  • 64
  • 91
  • I came across the same problem with .NET 4.0 application running on Windows Server 2008 r2 64 bit machine. Any news on the subject? Doron – DoronBM Mar 06 '12 at 18:44
  • @DoronBM, please comment on the Microsoft bug database link above and work to escalate it with your Microsoft representative. Thanks! As far as I know there has been no resolution. – Barka Mar 07 '12 at 18:56