0

I have created Search algorithm to search any string in the Text.I just want the count of matches returned and want to search by only one key. I have tested the performance by searching single character 'a' in file of size 2 mb and it takes around 7 seconds.Can you please suggest better algorithm or if i am missing anything here.

   public int SearchText(string fromString,string searchText,bool isCaseSensitive)
       {
           int searchTextLength=searchText.Length;
           if (!isCaseSensitive)
           {
               fromString = fromString.ToLower();
               searchText = searchText.ToLower();
           }
            int matchCount=0;
            while (fromString.Length > searchText.Length)
            {
                int firstMatchIndex=fromString.IndexOf(searchText[0]);
                if (firstMatchIndex > -1)
                {
                    if (searchText == fromString.Substring(firstMatchIndex, searchTextLength))
                        matchCount++;
                    fromString = fromString.Remove(0, firstMatchIndex + searchTextLength);
                }
                else
                    break;
            }
            return matchCount;
       }
Rishikesh
  • 486
  • 6
  • 15

2 Answers2

1

You are creating unnecessary temporary strings all over the place. You can change it to something like this.. which should be faster:

public int SearchText(string fromString, string searchText, bool isCaseSensitive) {
    int matchCount = 0;

    var comparison = isCaseSensitive
        ? StringComparison.InvariantCulture
        : StringComparison.InvariantCultureIgnoreCase;

    int foundIndex = fromString.IndexOf(searchText,
        comparison);

    while (foundIndex > -1) {
        foundIndex = fromString.IndexOf(searchText,
            foundIndex + 1,
            comparison);

        matchCount++;
    }

    return matchCount;
}

EDIT: I tested this. It takes 197ms to process 2MB of randomized data.

Simon Whitehead
  • 63,300
  • 9
  • 114
  • 138
  • Thanks you!!You are right. I have checked your code with same input and indeed it takes less than a second.I should have just checked full text using IndexOf and not one character first. I think i was lost somewhere while writing the code. – Rishikesh Dec 11 '13 at 05:36
1

Try using regular expressions

public int SearchText(string fromString, string searchText, bool isCaseSensitive)
{
        RegexOptions options = isCaseSensitive ? 
            RegexOptions.None : RegexOptions.IgnoreCase;
        return Regex.Matches(fromString, Regex.Escape(searchText), options).Count;
}

EDIT I tested this in LinqPad, and it takes 113 ms to get the count from 2.5 MB of Lorem Ipsum.

damienc88
  • 1,957
  • 19
  • 34
  • 1
    Remember to escape the input, otherwise it may produce unexpected output. – Nickolai Nielsen Dec 11 '13 at 05:24
  • Thank you for your alternate approach. I always tend to overlook Regex's power. TEsted it too and it take around 60-80ms similar to that of approach suggested by simon which in range of (50-70 ms). Tested by executing both the approaches 10 times. – Rishikesh Dec 11 '13 at 06:01