5

Since there is no case insensitive string.Contains() (yet a case insensitive version of string.Equals() exists which baffles me, but I digress) in .NET, What is the performance differences between using RegEx.IsMatch() vs. using String.ToUpper().Contains()?

Example:

string testString = "tHiSISaSTRINGwiThInconSISteNTcaPITaLIZATion";

bool containsString = RegEx.IsMatch(testString, "string", RegexOptions.IgnoreCase);
bool containsStringRegEx = testString.ToUpper().Contains("STRING");

I've always heard that string.ToUpper() is a very expensive call so I shy away from using it when I want to do string.Contains() comparisons, but how does RegEx.IsMatch() compare in terms of performance?

Is there a more efficient approach for doing such comparisons?

Saggio
  • 2,212
  • 6
  • 33
  • 50
  • 4
    Have you tried using [Stopwatch](http://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.aspx)? – Sayse Jul 10 '13 at 19:34
  • 2
    The only way to know which one is faster is to run them both and time them. This might help: http://stackoverflow.com/questions/457605/how-to-measure-code-performance-in-net – kevingessner Jul 10 '13 at 19:35
  • 3
    What about `testString.IndexOf("string", StringComparison.CurrentCultureIgnoreCase) >= 0`? – Justin Niessner Jul 10 '13 at 19:35
  • 1
    String.Contains is just a wrapper around `IndexOf(value, StringComparison.Ordinal) >=0` – IngisKahn Jul 10 '13 at 19:36
  • 3
    IN MOST CASES avoid using ToLower / ToUpper for such things. It's bad practise – Fabian Bigler Jul 10 '13 at 19:37
  • @FabianBigler care to elaborate on that? I can think of a few scenarios where it's perfectly fine to use `ToUpper`/`ToLower` in string comparisons. There can be anomalies when using them based on the culture, it's always safer to use `ToLowerInvariant`/ `ToUpperInvariant`. – James Jul 10 '13 at 19:38
  • 1
    @James I can't, if you are using any function of `String` that has a overload that takes in a `StringComparison` enum, there is NEVER a reason to use ToUpper/ToLower. And to your culture argument, that is what `OrdnalIgnoreCase` is for. – Scott Chamberlain Jul 10 '13 at 19:39
  • 3
    ToUpper / ToLower may trick you if you support a global world with many languages. – Michael Viktor Starberg Jul 10 '13 at 19:40
  • @ScottChamberlain of course when you have `StringComparison`, however, there are other scenarios where a case insensitive search can't be done and using `ToUpper`/`ToLower` is a good alternative. – James Jul 10 '13 at 19:40
  • @James Yes, you're right. I was a bit too harsh, so I corrected myself to 'most cases'. But in the very most situations you should be able to solve your problem with StringComparison. – Fabian Bigler Jul 10 '13 at 19:40
  • 1
    @MichaelViktorStarberg yep, hence my comment regarding `ToUpperInvariant`/`ToLowerInvariant`. – James Jul 10 '13 at 19:42
  • @James Yes, but that would not be "situations like these" (the text you originally replied to before it was edited) I can think of situations too, like using a `Switch` statement, but I think that falls firmly in the "different situation" category. – Scott Chamberlain Jul 10 '13 at 19:42
  • @ScottChamberlain just to clarify, I am not saying it's ok to use `ToLower`/`ToUpper` if you have the option to use `StringComparison` as clearly `StringComparison` is far more reliable - I was merely responding to the comment as it was originally stated which was along the lines of "*NEVER use `ToLower`/`ToUpper` for string comparison.*" – James Jul 10 '13 at 19:45
  • On the other hand R# complains everytime you do .ToString() - Enough already! =) – Michael Viktor Starberg Jul 10 '13 at 19:46

3 Answers3

16

Here's a benchmark

using System;
using System.Diagnostics;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main(string[] args)
    {
        Stopwatch sw = new Stopwatch();

        string testString = "tHiSISaSTRINGwiThInconSISteNTcaPITaLIZATion";

        sw.Start();
        var re = new Regex("string", RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
        for (int i = 0; i < 1000000; i++)
        {
            bool containsString = re.IsMatch(testString);
        }
        sw.Stop();
        Console.WriteLine("RX: " + sw.ElapsedMilliseconds);

        sw.Restart();
        for (int i = 0; i < 1000000; i++)
        {
            bool containsStringRegEx = testString.ToUpper().Contains("STRING");
        }


        sw.Stop();
        Console.WriteLine("Contains: " + sw.ElapsedMilliseconds);

        sw.Restart();
        for (int i = 0; i < 1000000; i++)
        {
            bool containsStringRegEx = testString.IndexOf("STRING", StringComparison.OrdinalIgnoreCase) >= 0 ;
        }


        sw.Stop();
        Console.WriteLine("IndexOf: " + sw.ElapsedMilliseconds);
    }
}

Results were

IndexOf (183ms) > Contains (400ms) > Regex (477ms)

(Updated output times using the compiled Regex)

keyboardP
  • 68,824
  • 13
  • 156
  • 205
  • 1
    Results on my PC: RX: 3032 Contains: 385 IndexOf: 97 (unoptimized build under mono) (PS. I made the Regex precompiled) – sehe Jul 10 '13 at 19:47
  • @Ata - Do you mean its impact on the results? I ran the code with an outside `for` loop over all three tests but the differences were negligible to the results above. – keyboardP Jul 10 '13 at 20:02
  • Wow, I would have never thought that `RegEx` was slower than `ToUpper().Contains()`, and it hadn't even crossed my mind to use `IndexOf() >= 0` to accomplish the task, let alone it being the quickest by far. Thanks! Looks like I will be using `IndexOf()` in the future – Saggio Jul 10 '13 at 21:20
  • @Saggio - You're welcome. Regex can be slower than string manipulation but it's also a lot more flexible and performs better for more complex parsing so it's a case using the right tool for the job. Personally, for basic string manipulation, I try and avoid regex ([see this post for an example](http://stackoverflow.com/questions/17457677/how-can-i-remove-none-alphabet-chars-from-a-string)) – keyboardP Jul 10 '13 at 21:25
10

There is another version using String.IndexOf(String,StringComparison) that might be more efficient than either of the two you suggested:

string testString = "tHiSISaSTRINGwiThInconSISteNTcaPITaLIZATion";
bool contained = testString.IndexOf("string", StringComparison.OrdinalIgnoreCase) >= 0;

If you need a culture-sensitive comparison, use CurrentCultureIgnoreCase instead of OrdinalIgnoreCase.

Douglas
  • 53,759
  • 13
  • 140
  • 188
0

I would expect RegEx.match to be slow based on personal experience with regular expression parsers in general. But as many folks have mentioned, profiling it is the best way to find out for sure. I've had to fix performance issues related to regular expression parsers, toLower and toUpper have never come back to bite me.

ridoy
  • 6,274
  • 2
  • 29
  • 60
John Lockwood
  • 3,787
  • 29
  • 27