13

I have a list of about 10,000 staff members in a List<T> and I have a ListBox which contains a subset of those staff, depending on the search term in a text box.

Say a Staff object has the following publicly exposed properties:

string FirstName
string LastName
string MiddleName
   int StaffID
   int CostCentre

I could write a function like this:

bool staffMatchesSearch(Staff stf)
{
  if (tbSrch.Text.Trim() == string.Empty)
    return true; // No search = match always.

  string s = tbSrch.Text.Trim().ToLower();

  // Do the checks in the order most likely to return soonest:
  if (stf.LastName.ToLower().Contains(s))
    return true;
  if (stf.FirstName.ToLower().Contains(s))
    return true;
  if (stf.MiddleName.ToLower().Contains(s))
    return true;

  if (stf.CostCentre.ToString().Contains(s))
    return true; // Yes, we want partial matches on CostCentre
  if (stf.StaffID.ToString().Contains(s))
    return true; // And also on StaffID

  return false;
}

and then do something like:

tbSrch_TextChanged(object sender, EventArgs e)
{
  lbStaff.BeginUpdate();
  lbStaff.Items.Clear();

  foreach (Staff stf in staff)
    if (staffMatchesSearch(stf))
      lbStaff.Items.Add(stf);

  lbStaff.EndUpdate();
}

The filtering is re-evaluated every time the user changes the contents of the tbSrch box.

This works, and it's not awfully slow, but I was wondering if I could make it any faster?

I have tried to re-write the whole thing to be multi-threaded, however with only 10,000 staff members the overhead seemed to take away the bulk of the benefit. Also, there were a bunch of other bugs like if searching for "John", the user first presses "J" which spools up the threads, but when the user presses the "o" another set are spooled up before the first lot have had a chance to return their results. A lot of the time, the results get returned in a jumbled order and all sorts of nasty things happen.

I can think of a few tweaks that would make the best-case scenario significantly better, but they would also make the worst-case scenario a lot worse.

Do you have any ideas on how this can be improved?


Great suggestions I've implemented far, and their results:

  • Add a delay on the ValueChanged event so that if the user types a 5-character name quickly on the keyboard, it only performs 1 search at the end rather than 5 in series.
  • Pre-evaluate ToLower() and store in the Staff class (as a [NonSerialized] attribute so it doesn't take up extra space in the save file).
  • Add a get property in Staff which returns all the search criteria as a single, long, lower-case, concatenated string. Then run a single Contains() on that. (This string is stored in the Staff object so it only gets constructed once.)

So far, these have lowered search times from around 140ms to about 60ms (though these numbers are highly subjective depending on the actual search performed and number of results returned).

Ozzah
  • 10,631
  • 16
  • 77
  • 116
  • do you really want to `toString` those `int`s? Seems like you'd want a matches string method and matches int method... I mean, if I'm in center 13, I shouldn't show up because someone searches for center 1 or center 3... – corsiKa Nov 16 '11 at 03:56
  • Try implementing one of the Boyer-Moore family of string searching algorithms? Preprocessing either the search term, or the Staff objects and reusing the result could save a lot of time. – millimoose Nov 16 '11 at 03:57
  • Do your really want to search for all the matches for each of the staff properties each time? As a user, I would want to only search on one or two known fields at a time. – ChandlerPelhams Nov 16 '11 at 03:57
  • @glowcoder This is exactly the behaviour they are looking for. – Ozzah Nov 16 '11 at 04:01
  • Can you wait until maybe two letters are entered before searching, or only do a search if they enter an asterisk? – Daryl Nov 16 '11 at 04:01
  • @ChandlerPelhams This is what the user has asked for. A flexible search box which allows them to quickly narrow down the full staff list by a variety of criteria. :) – Ozzah Nov 16 '11 at 04:02
  • @Daryl I think the problem is that the narrowing is *unresponsive*, not that it consumes too much CPU on the whole. Only searching after three characters or an asterisk are entered just delays the problem. – millimoose Nov 16 '11 at 04:09
  • Would you want to possibly change the search functions from 'contains' to only look at the first x characters of the string to compare to where x is the length of your search string from the search box. If I were trying to search for a person with the first name of 'Mary', I would would also get 'Amy' or 'Adam'... – ChandlerPelhams Nov 16 '11 at 04:11

5 Answers5

7

1) as pointed out in the comments, you probably shouldn't .ToString the numeric fields - just match the numbers

2) the ToLower calls are a perf drain. Add lowercased versions of these properties to the Staff class so the ToLower only has to be done once

3) when the user enters another character, you don't need to reevaluate all items. entering a character will only reduce the number of matches so only reevaluate the previous matches.

4) use a timer to introduce a delay between when the user types and when you kick off a search. if they are typing multiple characters quickly, you might as well wait until they've paused for half a second

5) Check the key pressed. If NaN then don't check the int properties.

BlueChippy
  • 5,935
  • 16
  • 81
  • 131
Robert Levy
  • 28,747
  • 6
  • 62
  • 94
  • 2
    "when the user enters another character, you don't need to reevaluate all items. entering a character will only reduce the number of matches so only reevaluate the previous matches" This will only work if they type additional characters at the end. It is a special-case optimization. Might be worth doing, but it won't speed up all operations, and could make logic more complicated for other operations. – Merlyn Morgan-Graham Nov 16 '11 at 04:13
  • 1
    @MerlynMorgan-Graham fair point but this special case is probably the most common case so probably worth the code complexity – Robert Levy Nov 16 '11 at 04:15
  • 1
    Perhaps also: Check the key pressed - if its NaN then don't check the int properties. – BlueChippy Nov 16 '11 at 04:16
  • @BlueChippy - nice. i marked this answer as a wiki so feel free to add that to the list – Robert Levy Nov 16 '11 at 04:19
  • Thanks @RobertLevy also wondering: ToLower() all the properties that you are searching on, and also ToLower all input strings on keypress. Then the search must always be case insensitive (as all are lower). – BlueChippy Nov 16 '11 at 04:54
2

You could add a 'SearchTerm' private property to the Staff object that's (FirstName + LastName + MiddleName + StaffID + CostCentre).ToLower() , and do the Contains() check on that instead. This would stop you having to do a ToLower() on each string each time and reduce the number of Contains() checks from 5 to 1.

Michael Low
  • 24,276
  • 16
  • 82
  • 119
  • this reduction in Contains calls will have negligible benefits (the one big call is doing just as much work as the smaller ones) – Robert Levy Nov 16 '11 at 04:18
  • Yes the changes to Contains() wouldn't be a huge difference, I think the main benefit of this approach would be removing the need to keep on calling ToLower() each time. – Michael Low Nov 16 '11 at 04:22
  • The issue is, I have deliberately split it up into multiple small `if`s so that in the best-case scenario it will bail out with a `return true` quickly. If I add all the possible search terms together in a `(string1 + string2 + ... + stringN).ToLower().Contains()` then I have to concatenate, process, and search through a much longer string. I think this would run slower (???). – Ozzah Nov 16 '11 at 04:29
  • I don't think so, as it would stop the Contains() check as soon as it gets a match whether it's a long string or short one. As this concatenated string would only be used for searching, you could do the ToLower call just once when setting the string compared to on each search as at present. You'd be best to profile it and see what your results are though. – Michael Low Nov 16 '11 at 04:37
  • 1
    I ran a bunch of tests and found that your method, on average, will give results in about 75-85ms whereas my method takes about 136ms on average. I am going to combine this with the pre-evaluated `ToLower()` and see how that goes! – Ozzah Nov 16 '11 at 05:16
  • 1
    It shaved another 20ms off. With the pre-evaluated lower-case strings, it only takes about 60ms on average, which is quite good! – Ozzah Nov 16 '11 at 05:25
2

You could try implementing a trie or "prefix-tree":

http://en.wikipedia.org/wiki/Trie

This would allow you to search for text that begins with the value.

I believe the linked article on suffix-trees would allow you to do a full text search, though it has higher storage requirements.

Make sure you ToLower all your data when inserting it into your structure so you don't have to do case insensitive comparisons while doing your lookup.

Merlyn Morgan-Graham
  • 58,163
  • 16
  • 128
  • 183
1

In seeing what you've done (mostly from comments on @mikel's answer), it sounds like you're getting there. One suggestion I haven't seen which could offer a speed increase is to do a comparison using StringComparison.OrdinalIgnoreCase. In your case, it would mean replacing

if (stf.LastName.ToLower().Contains(s))
  return true;

with

if (stf.LastName.IndexOf(s, StringComparison.OrdinalIgnoreCase) >= 0)
  return true;

Here is an MSDN link, discussing fast string comparisons.

Joel Rondeau
  • 7,486
  • 2
  • 42
  • 54
  • This is a really good idea and I'll definitely use it in future, but for this instance I think it's obsolete since I'm now pre-computing the lower-case versions. Thanks :) – Ozzah Nov 16 '11 at 21:40
  • Seems reasonable, although you should check out this question about upper vs lower case: http://stackoverflow.com/questions/9033/hidden-features-of-c#12137 – Joel Rondeau Nov 16 '11 at 22:00
0
using System;
using System.Text.RegularExpressions;
namespace PatternMatching1
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                Console.WriteLine("Please enter the first string.");
                string str = Console.ReadLine(); ;
                string replacestr = Regex.Replace(str, "[^a-zA-Z0-9_]+", " ");



                Console.WriteLine("Please enter the second string.");
                string str1 = Console.ReadLine(); ;
                string replacestr1 = Regex.Replace(str1, "[^a-zA-Z0-9_]+", " ");



                if (replacestr.Length == replacestr1.Length)
                {
                    char[] cFirst = replacestr.ToLower().ToCharArray();
                    char[] cSecond = replacestr1.ToLower().ToCharArray();

                    Array.Sort<char>(cFirst);
                    Array.Sort<char>(cSecond);

                    if ((new string(cFirst)).Equals((new string(cSecond))))
                        Console.WriteLine("Both String Same");
                    else
                        Console.WriteLine("Both String Not Same");
                }
                else
                    Console.WriteLine("oopsss, something going wrong !!!! try again");
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
            Console.Read();
        }
    }
}

`

abdul
  • 157
  • 5
  • 18