0

This following code works perfectly fine on a small data set. However, the GetMatchCount and BuildMatchArrary are very sluggish on large result. Can anyone recommend any different approach so save processing time? Would it be better to write the array to a file? Are lists just generally slow and not the best option?

using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

public class Client
{
    public int Id;

    public string FirstName
    {
        get
        {
            var firstName = //<call to get from database via Id>

            return firstName;
        }
    }

    public string MiddleName
    {
        get
        {
            var middleName =  //<call to get from database via Id>

            return middleName;
        }
    }

    public string LastName
    {
        get
        {
            var lastName =  //<call to get from database via Id>

            return lastName;
        }
    }

    public string FullName
    {
        get
        {
            return FirstName + " " + MiddleName + " " + LastName;
        }
    }

    public int GetMatchCount(IEnumerable<string> clientFirstNames, IEnumerable<string> clientMiddleNames, IEnumerable<string> clientLastNames)
    {
        var clientFullNames = BuildMatchArray(clientFirstNames, clientMiddleNames, clientLastNames);
        return clientFullNames.Count(x => x == FullName);
    }


    public string[] BuildMatchArray(IEnumerable<string> clientFirstNames, IEnumerable<string> clientMiddleNames, IEnumerable<string> clientLastNames)
    {
        Debug.Assert(clientFirstNames.Count() == clientMiddleNames.Count() && clientMiddleNames.Count() == clientLastNames.Count());

        var clientFullNames = new List<string>();
        for (int i = 0; i < clientFirstNames.Count(); i++)
        {
            clientFullNames.Add(clientFirstNames.ElementAt(i) + " " + clientMiddleNames.ElementAt(i) + " " + clientLastNames.ElementAt(i));
        }
        return clientFullNames.ToArray();
    }
}
  • Can you tell us more about the call to the database? Are your first middle and last names stored in separate tables for some reason? Is each property indexed? As it is, even getting one full name looks like it requires 3 select statements in your database. Could that be trimmed? – NathanTempelman Apr 10 '18 at 23:43
  • 1
    seems like it needs triple zip https://stackoverflow.com/questions/10297124/how-to-combine-more-than-two-generic-lists-in-c-sharp-zip, and a `Lookup` – Slai Apr 10 '18 at 23:54
  • Yes, you could also use the proposed `ZipThree` method in one of the answers on that post. `firsts.ZipThree(middles, lasts, (f, m, l) => $"{f} {m} {l}")`. This is probably a better and more reusable solution than my answer. – JamesFaix Apr 11 '18 at 00:34

1 Answers1

1

Where are you getting these strings? If you are using lazy sequences, every time you call Count() you will have to iterate the entire sequence to count how many objects are in the sequence. If the IEnumerable<T> is really a T[] or List<T>, then Count() is optimized to just call the Length or Count property, which isn't expensive. Similarly, ElementAt is also very inefficient and iterates the collection. So with an in-memory lazy sequence this performance will be bad, but if you are streaming results from SQL or an external source, it will be really bad or possibly even incorrect.

A more performant implementation of of BuildMatchArray would be like this:

public IEnumerable<string> ZipNames(IEnumerable<string> firsts, 
    IEnumerable<string> middles, IEnumerable<string> lasts) 
{
    using(var e1 = firsts.GetEnumerator())
    using(var e2 = middles.GetEnumerator())
    using(var e3 = lasts.GetEnumerator())
    {
        var stop = false;

        while (!stop)
        {
            var hasNext1 = e1.MoveNext();
            var hasNext2 = e2.MoveNext();
            var hasNext3 = e3.MoveNext();

            if (hasNext1 && hasNext2 && hasNext3) 
            {
                yield return $"{e1.Current} {e2.Current} {e3.Current}";
            }
            else
            {
                stop = true;
                Debug.Assert(!(hasNext1 || hasNext2 || hasNext3));
            }
        }
    }
}

This requires only one iteration of each input collection, and doesn't need to copy elements to a new List<T>. Another point to note, is that List<T> starts with capacity for 4 elements, and when it fills up, it copies all elements to a new list with double capacity. So if you have a large sequence, you will copy many times.

This implementation is very similar to System.Linq.Enumerable.Zip

In your case, you also shouldn't do a ToArray to your sequence. This will require another copying, and can potentially be a huge array. If you are only sending that array to .Count(x => x == y), then keeping a lazy IEnumerable would be better, because Count operates lazily for lazy sequences and streams data in and counts elements as it sees them, without ever requiring the full collection to be in memory.

See IEnumerable vs List - What to Use? How do they work?

JamesFaix
  • 8,050
  • 9
  • 37
  • 73
  • 1
    Shouldn't there be some kind of loop..? – Thomas Hilbert Apr 11 '18 at 00:19
  • Yes @ThomasHilbert, my mistake, it should be a `while` not an `if`/`else` – JamesFaix Apr 11 '18 at 00:20
  • 1
    Iterators should be `Move()`d inside that loop :-) – Thomas Hilbert Apr 11 '18 at 00:23
  • You wouldn't want to open a `using` block inside a loop, you need to call `MoveNext` and `Current` in the loop, in order to get a different result each time. – JamesFaix Apr 11 '18 at 00:24
  • 1
    Well currently neither the iterator positions nor hasNext[1,2,3] will change and the loop will just spin forever. – Thomas Hilbert Apr 11 '18 at 00:25
  • You are correct. My loop is going through everything over each count. My fried brain today didn't even pick up on that. I will test this out tomorrow but I think your solution will work perfectly. Shouldn't the debug.assert be inside the else. The hasNext won't be seen by the compiler outside the loop. – user2705495 Apr 11 '18 at 00:44
  • @user2705495 Right you are. My code examples are sloppy tonight, I think its time for me to get off SO! – JamesFaix Apr 11 '18 at 01:11