We've got a slight performance issue in a section of code that's using LINQ and it's raised a question about how LINQ works in terms of lookups
My question is this (please note that I have changed all the code so this is an indicative example of the code, not the real scenario):
Given
public class Person {
int ID;
string Name;
DateTime Birthday;
int OrganisationID;
}
If I had a list of say 100k Person objects and then a list of dates, say 1000, and I ran this code:
var personBirthdays = from Person p in personList
where p.OrganisationID = 123
select p.Birthday;
foreach (DateTime d in dateList)
{
if (personBirthdays.Contains(d))
Console.WriteLine(string.Format("Date: {0} has a Birthday", d.ToShortDateString()));
}
The resultant code would be an iteration of:
100k (the loop that needs to be done to find the users with the organisationID 123)
multiplied by
1000 (the amount of dates in the list)
multiplied by
x (the amount of users who have the organisationID 123 to be checked against for the date)
This is a lot of iterations!
If I changed the code the personBirthdays to this:
List<DateTime> personBirthdays =
(from Person p in personList
where p.OrganisationID = 123
select p.Birthday).ToList();
This should remove the 100k as a multiple by, and just do it once?
So you would have 100k + (1000 * x) instead of (100k * 1000 * x).
The question is that this seems too easy, and I'm sure the LINQ is doing something clever somewhere that should mean that this doesn't happen.
If no one answers, I'll run some tests and report back.
Clarity update:
We're not considering Database lookups, the personList
object is an In Memory list object. This all LINQ-to-Objects.