47

Is there any difference in speed/memory usage for these two equivalent expressions:

Regex.IsMatch(Message, "1000")

Vs

Message.Contains("1000")

Any situations where one is better than other ?

The context of this question is as follows: I was making some changes to legacy code which contained the Regex expression to find whether a string is contained within another string. Being legacy code I did not make any changes to that and in the code review somebody suggested that Regex.IsMatch should be replaced by string.Contains. So I was wondering whether the change was worth making.

Pradeep
  • 731
  • 1
  • 7
  • 13
  • possible duplicate of [regex VS Contains. Best Performance ?](http://stackoverflow.com/questions/2023792/regex-vs-contains-best-performance) – Ta01 Jun 03 '10 at 00:53
  • 3
    @Random, that's related, but a more complicated example. It's also using Java, which has a different regex syntax. – Matthew Flaschen Jun 03 '10 at 00:55

7 Answers7

55

For simple cases String.Contains will give you better performance but String.Contains will not allow you to do complex pattern matching. Use String.Contains for non-pattern matching scenarios (like the one in your example) and use regular expressions for scenarios in which you need to do more complex pattern matching.

A regular expression has a certain amount of overhead associated with it (expression parsing, compilation, execution, etc.) that a simple method like String.Contains simply does not have which is why String.Contains will outperform a regular expression in examples like yours.

Andrew Hare
  • 344,730
  • 71
  • 640
  • 635
  • 1
    Obviously this is a very old answer, but maybe you could give your sources? @user279470 seems to present some very good evidence that this answer is wrong. – JDB Oct 02 '12 at 13:42
  • 1
    and then there's this benchmark that agrees with Andrew http://stackoverflow.com/a/17579471/289992 Take your pick! – bottlenecked Feb 12 '14 at 16:32
41

String.Contains is slower when you compare it to a compiled regular expression. Considerably slower, even!

You can test it running this benchmark:

class Program
{
  public static int FoundString;
  public static int FoundRegex;

  static void DoLoop(bool show)
  {
    const string path = "C:\\file.txt";
    const int iterations = 1000000;
    var content = File.ReadAllText(path);

    const string searchString = "this exists in file";
    var searchRegex = new Regex("this exists in file");

    var containsTimer = Stopwatch.StartNew();
    for (var i = 0; i < iterations; i++)
    {
      if (content.Contains(searchString))
      {
        FoundString++;
      }
    }
    containsTimer.Stop();

    var regexTimer = Stopwatch.StartNew();
    for (var i = 0; i < iterations; i++)
    {
      if (searchRegex.IsMatch(content))
      {
        FoundRegex++;
      }
    }
    regexTimer.Stop();

    if (!show) return;

    Console.WriteLine("FoundString: {0}", FoundString);
    Console.WriteLine("FoundRegex: {0}", FoundRegex);
    Console.WriteLine("containsTimer: {0}", containsTimer.ElapsedMilliseconds);
    Console.WriteLine("regexTimer: {0}", regexTimer.ElapsedMilliseconds);

    Console.ReadLine();
  }

  static void Main(string[] args)
  {
    DoLoop(false);
    DoLoop(true);
    return;
  }
}
Wayne Koorts
  • 10,861
  • 13
  • 46
  • 72
user279470
  • 454
  • 4
  • 4
  • 4
    Running it on a random EDIFACT INVRP file of 60kb with "this exists in file" stuffed in halfway through: containsTimer: 84925 regexTimer: 10633 – user279470 Sep 01 '10 at 10:23
  • 3
    Though it's not String.Contains(), I just modified a search & replace function in my program to use a compiled Regex object instead of 'Value.ToString.IndexOf(SearchString, StringComparison.CurrentCultureIgnoreCase)'. My replace-all test using a >44,000 row DataGridView (1,921 replacements) went from ~7.5 minutes to ~30 seconds. – Ski Oct 13 '11 at 15:16
  • If you add .IndexOf to this, it's even slower than .Contains. Regex is just awesome. – Chad Schouggins Aug 27 '13 at 18:16
  • 2
    content.Contains win. because `new Regex("this exists in file")` != `content.Contains(searchString)`... but `new Regex(".+this exists in file.+")`. – dovid Mar 26 '17 at 08:19
  • yes, when `Contains` finds the search string closer to the start of the string `Contains` wins. Otherwise regex wins, I guess we should say regex has better performence overall but `Contains` wins for small strings(lets say below 1000-2000chars) – mkb Jan 17 '21 at 03:08
9

To determine which is the fastest you will have to benchmark your own system. However, regular expressions are complex and chances are that String.Contains() will be the fastest and in your case also the simplest solution.

The implementation of String.Contains() will eventually call the native method IndexOfString() and the implementation of that is only known by Microsoft. However, a good algorithm for implementing this method is using what is known as the Knuth–Morris–Pratt algorithm. The complexity of this algorithm is O(m + n) where m is the length of the string you are searching for and n is the length of the string you are searching making it a very efficient algorithm.

Actually, the efficiency of search using regular expression can be as low O(n) depending on the implementation so it may still be competetive in some situations. Only a benchmark will be able to determine this.

If you are really concerned about search speed Christian Charras and Thierry Lecroq has a lot of material about exact string matching algorithms at Université de Rouen.

Martin Liversage
  • 104,481
  • 22
  • 209
  • 256
6

@user279470 I was looking for an efficient way to count words just for fun and came across this . I gave it the OpenOffice Thesaurus dat file to iterate through. Total Word Count came to 1575423.

Now, my end goal didn't have a use for contains, but what was interesting was seeing the different ways you can call regex that make it even faster. I created some other methods to compare an instance use of regex and a static use with the RegexOptions.compiled.

public static class WordCount
{
    /// <summary>
    /// Count words with instaniated Regex.
    /// </summary>
    public static int CountWords4(string s)
    {
        Regex r = new Regex(@"[\S]+");
        MatchCollection collection = r.Matches(s);
        return collection.Count;
    }
    /// <summary>
    /// Count words with static compiled Regex.
    /// </summary>
    public static int CountWords1(string s)
    {
        MatchCollection collection = Regex.Matches(s, @"[\S]+", RegexOptions.Compiled);
        return collection.Count;
    }
    /// <summary>
    /// Count words with static Regex.
    /// </summary>
    public static int CountWords3(string s)
    {
        MatchCollection collection = Regex.Matches(s, @"[\S]+");
        return collection.Count;
    }

    /// <summary>
    /// Count word with loop and character tests.
    /// </summary>
    public static int CountWords2(string s)
    {
        int c = 0;
        for (int i = 1; i < s.Length; i++)
        {
            if (char.IsWhiteSpace(s[i - 1]) == true)
            {
                if (char.IsLetterOrDigit(s[i]) == true ||
                    char.IsPunctuation(s[i]))
                {
                    c++;
                }
            }
        }
        if (s.Length > 2)
        {
            c++;
        }
        return c;
    }
}
  • regExCompileTimer.ElapsedMilliseconds 11787
  • regExStaticTimer.ElapsedMilliseconds 12300
  • regExInstanceTimer.ElapsedMilliseconds 13925
  • ContainsTimer.ElapsedMilliseconds 1074
spring1975
  • 61
  • 1
  • 1
4

My own bench marks appear to contradict user279470's benchmark results.

In my use case I wanted to check a simple Regex with some OR operators for 4 values versus doing 4 x String.Contains().

Even with 4 x String.Contains(), I found that String.Contains() was 5 x faster.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System.Text.RegularExpressions;

namespace App.Tests.Performance
{
    [TestClass]
    public class PerformanceTesting
    {
        private static Random random = new Random();

        [TestMethod]
        public void RegexVsMultipleContains()
        {
            var matchRegex = new Regex("INFO|WARN|ERROR|FATAL");

            var testStrings = new List<string>();

            int iterator = 1000000 / 4; // div 4 for each of log levels checked

            for (int i = 0; i < iterator; i++)
            {
                for (int j = 0; j < 4; j++)
                {
                    var simulatedTestString = RandomString(50);

                    if (j == 0)
                    {
                        simulatedTestString += "INFO";
                    }
                    else if (j == 1)
                    {
                        simulatedTestString += "WARN";
                    }
                    else if (j == 2)
                    {
                        simulatedTestString += "ERROR";
                    }
                    else if (j == 3)
                    {
                        simulatedTestString += "FATAL";
                    }

                    simulatedTestString += RandomString(50);

                    testStrings.Add(simulatedTestString);
                }
            }

            int cnt;
            Stopwatch sw;

            //////////////////////////////////////////
            // Multiple contains test
            //////////////////////////////////////////

            cnt = 0;
            sw = new Stopwatch();

            sw.Start();

            for (int i = 0; i < testStrings.Count; i++)
            {
                bool isMatch = testStrings[i].Contains("INFO") || testStrings[i].Contains("WARN") || testStrings[i].Contains("ERROR") || testStrings[i].Contains("FATAL");

                if (isMatch)
                {
                    cnt += 1;
                }
            }

            sw.Stop();

            Console.WriteLine("MULTIPLE CONTAINS: " + cnt + " " + sw.ElapsedMilliseconds);

            //////////////////////////////////////////
            // Multiple contains using list test
            //////////////////////////////////////////

            cnt = 0;
            sw = new Stopwatch();

            sw.Start();

            var searchStringList = new List<string> { "INFO", "WARN", "ERROR", "FATAL" };

            for (int i = 0; i < testStrings.Count; i++)
            {
                bool isMatch = searchStringList.Any(x => testStrings[i].Contains(x));

                if (isMatch)
                {
                    cnt += 1;
                }
            }

            sw.Stop();

            Console.WriteLine("MULTIPLE CONTAINS USING LIST: " + cnt + " " + sw.ElapsedMilliseconds);

            //////////////////////////////////////////
            // Regex test
            ////////////////////////////////////////// 

            cnt = 0;
            sw = new Stopwatch();

            sw.Start();

            for (int i = 0; i < testStrings.Count; i++)
            {
                bool isMatch = matchRegex.IsMatch(testStrings[i]);

                if (isMatch)
                {
                    cnt += 1;
                }
            }

            sw.Stop();

            Console.WriteLine("REGEX: " + cnt + " " + sw.ElapsedMilliseconds);
        }

        public static string RandomString(int length)
        {
            const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

            return new string(Enumerable.Repeat(chars, length).Select(s => s[random.Next(s.Length)]).ToArray());
        }
    }
}
gbro3n
  • 6,729
  • 9
  • 59
  • 100
  • for small strings Contains wins, but when I repace random string addition line with `simulatedTestString += RandomString(500);` result was almost equal, for bigger strings I believe regex will win – mkb Jan 17 '21 at 03:17
1

Yes, for this task, string.Contains will almost certainly be faster and use less memory. And in of course, there's no reason to use regex here.

Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
0

Regex matches are handled by .NET Framework's regular expression engine. Below is the part from the msdn article which specifies exactly when to use Regex vs String search and replacement functions:

The String class includes a number of string search and replacement methods that you can use when you want to locate literal strings in a larger string. Regular expressions are most useful either when you want to locate one of several substrings in a larger string, or when you want to identify patterns in a string, as the following examples illustrate.

There are two examples illustrating the same in the article.

Ayushmati
  • 1,455
  • 1
  • 13
  • 15