5

I have a function that is walking through the string looking for pattern and changing parts of it. I could optimize it by inserting

if (!text.Contains(pattern)) return;

But, I am actually walking through the whole string and comparing parts of it with the pattern, so the question is, how String.Contains() actually works? I know there was such a question - How does String.Contains work? but answer is rather unclear. So, if String.Contains() walks through the whole array of chars as well and compare them to pattern I am looking for as well, it wouldn't really make my function faster, but slower.

So, is it a good idea to attempt such an optimizations? And - is it possible for String.Contains() to be even faster than function that just walk through the whole array and compare every single character with some constant one?

Here is the code:

    public static char colorchar = (char)3;

    public static Client.RichTBox.ContentText color(string text, Client.RichTBox SBAB)
    {
        if (text.Contains(colorchar.ToString()))
        {
            int color = 0;
            bool closed = false;
            int position = 0;
            while (text.Length > position)
            {
                if (text[position] == colorchar)
                {
                    if (closed)
                    {
                        text = text.Substring(position, text.Length - position);
                        Client.RichTBox.ContentText Link = new Client.RichTBox.ContentText(ProtocolIrc.decode_text(text), SBAB, Configuration.CurrentSkin.mrcl[color]);
                        return Link;
                    }

                    if (!closed)
                    {
                        if (!int.TryParse(text[position + 1].ToString() + text[position + 2].ToString(), out color))
                        {
                            if (!int.TryParse(text[position + 1].ToString(), out color))
                            {
                                color = 0;
                            }
                        }
                        if (color > 9)
                        {
                            text = text.Remove(position, 3);
                        }
                        else
                        {
                            text = text.Remove(position, 2);
                        }
                        closed = true;
                        if (color < 16)
                        {
                            text = text.Substring(position);
                            break;
                        }
                    }
                }
                position++;
            }
        }
        return null;
    }
Community
  • 1
  • 1
Petr
  • 13,747
  • 20
  • 89
  • 144
  • You'd have to post (an outline of) the rest of the code. But yes, most likely you're doing double work here. – H H Apr 19 '13 at 09:02
  • 4
    Why don't you time both approaches in a for loop ? – Habib Apr 19 '13 at 09:03
  • @Habib that's not a bad idea :) – Petr Apr 19 '13 at 09:06
  • 2
    "So, is it a good idea to attempt such an optimizations?" Almost certainly not! This must be one of the most-used functions in the library. It has gotten a lot of attention. Hard to imagine you could improve on it except for very special cases and by really dedicated specialists. – Thilo Apr 19 '13 at 09:06
  • 1
    Going through a string, matching patterns, changing strings... maybe you also want to time how a Regex performs. – Corak Apr 19 '13 at 09:10
  • 1
    You also may want to replace your code that finds a position with IndexOf method of string. like `position = text.IndexOf(colorchar);` and then `position = text.IndexOf(colorchar, position);` – Andrei Zubov Apr 19 '13 at 09:13
  • Setup and debug and see the .NET code. Then you can step into the String.Contains .NET function and see it yourself. Here is link on how to do it http://msdn.microsoft.com/en-us/library/cc667410.aspx – Guru Kara Apr 19 '13 at 09:36

5 Answers5

1

Short answer is that your optimization is no optimization at all.
Basically, String.Contains(...) just returns String.IndexOf(..) >= 0
You could improve your alogrithm to:

int position = text.IndexOf(colorchar.ToString()...);
if (-1 < position)
{  /* Do it */ }
Vegard Innerdal
  • 349
  • 2
  • 10
0

Yes.

And doesn't have a bug (ahhm...).

There are better ways of looking for multiple substrings in very long texts, but for most common usages String.Contains (or IndexOf) is the best.

Also IIRC the source of String.Contains is available in the .Net shared sources

Oh, and if you want a performance comparison you can just measure for your exact use-case

AK_
  • 7,981
  • 7
  • 46
  • 78
0

Check this similar post How does string.contains work

I think that you will not be able to simply do anything faster than String.Contains, unless you want to use standard CRT function wcsstr, available in msvcrt.dll, which is not so easy

Community
  • 1
  • 1
Hassan Mokdad
  • 5,832
  • 18
  • 55
  • 90
0

Unless you have profiled your application and determined that the line with String.Contains is a bottle-neck, you should not do any such premature optimizations. It is way more important to keep your code's intention clear.

Ans while there are many ways to implement the methods in the .NET base classes, you should assume the default implementations are optimal enough for most people's use cases. For example, any (future) implementation of .NET might use the x86-specific instructions for string comparisons. That would then always be faster than what you can do in C#.

If you really want to be sure whether your custom string comparison code is faster than String.Contains, you need to measure them both using many iterations, each with a different string. For example using the Stopwatch class to measure the time.

Daniel A.A. Pelsmaeker
  • 47,471
  • 20
  • 111
  • 157
0

If you now the details which you can use for optimizations (not just simple contains check) sure you can make your method faster than string.Contains, otherwise - not.

Vlad
  • 3,001
  • 1
  • 22
  • 52