The answer will depend on the state of your collection.
- If most entities will pass the Where test, apply Select first;
- If fewer entities will pass the Where test, apply Where first.
Update:
@YeldarKurmangaliyev wrote the answer with a concrete example and benchmarking. I ran similar code to verify his claim and our results are exactly opposite and that is because I ran the same test as his but with an object not as simple as the Point
type he used to run his tests.
The code very much looks like his code, except that I changed the name of class from Point
to EnumerableClass
.
Given below the classes I used to constitute the EnumerableClass
class:
public class EnumerableClass
{
public int X { get; set; }
public int Y { get; set; }
public String A { get; set; }
public String B { get; set; }
public String C { get; set; }
public String D { get; set; }
public String E { get; set; }
public Frame F { get; set; }
public Gatorade Gatorade { get; set; }
public Home Home { get; set; }
}
public class Home
{
private Home(int rooms, double bathrooms, Stove stove, InternetConnection internetConnection)
{
Rooms = rooms;
Bathrooms = (decimal) bathrooms;
StoveType = stove;
Internet = internetConnection;
}
public int Rooms { get; set; }
public decimal Bathrooms { get; set; }
public Stove StoveType { get; set; }
public InternetConnection Internet { get; set; }
public static Home GetUnitOfHome()
{
return new Home(5, 2.5, Stove.Gas, InternetConnection.Att);
}
}
public enum InternetConnection
{
Comcast = 0,
Verizon = 1,
Att = 2,
Google = 3
}
public enum Stove
{
Gas = 0,
Electric = 1,
Induction = 2
}
public class Gatorade
{
private Gatorade(int volume, Color liquidColor, int bottleSize)
{
Volume = volume;
LiquidColor = liquidColor;
BottleSize = bottleSize;
}
public int Volume { get; set; }
public Color LiquidColor { get; set; }
public int BottleSize { get; set; }
public static Gatorade GetGatoradeBottle()
{
return new Gatorade(100, Color.Orange, 150);
}
}
public class Frame
{
public int X { get; set; }
public int Y { get; set; }
private Frame(int x, int y)
{
X = x;
Y = y;
}
public static Frame GetFrame()
{
return new Frame(5, 10);
}
}
The classes Frame
, Gatorade
and Home
have a static method each to return an instance of their type.
Below is the main program:
public static class Program
{
const string Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
private static readonly Random Random = new Random();
private static string RandomString(int length)
{
return new string(Enumerable.Repeat(Chars, length)
.Select(s => s[Random.Next(s.Length)]).ToArray());
}
private static void Main()
{
var random = new Random();
var largeCollection =
Enumerable.Range(0, 1000000)
.Select(
x =>
new EnumerableClass
{
A = RandomString(500),
B = RandomString(1000),
C = RandomString(100),
D = RandomString(256),
E = RandomString(1024),
F = Frame.GetFrame(),
Gatorade = Gatorade.GetGatoradeBottle(),
Home = Home.GetUnitOfHome(),
X = random.Next(1000),
Y = random.Next(1000)
})
.ToList();
const int conditionValue = 250;
Console.WriteLine(@"Condition value: {0}", conditionValue);
var sw = new Stopwatch();
sw.Start();
var firstWhere = largeCollection
.Where(x => x.Y < conditionValue)
.Select(x => x.Y)
.ToArray();
sw.Stop();
Console.WriteLine(@"Where -> Select: {0} ms", sw.ElapsedMilliseconds);
sw.Restart();
var firstSelect = largeCollection
.Select(x => x.Y)
.Where(y => y < conditionValue)
.ToArray();
sw.Stop();
Console.WriteLine(@"Select -> Where: {0} ms", sw.ElapsedMilliseconds);
Console.ReadLine();
Console.WriteLine();
Console.WriteLine(@"First Where's first item: {0}", firstWhere.FirstOrDefault());
Console.WriteLine(@"First Select's first item: {0}", firstSelect.FirstOrDefault());
Console.WriteLine();
Console.ReadLine();
}
}
Results:
I ran the tests multiple times and found that
.Select().Where() performed better than .Where().Select().
when collection size is 1000000.
Here is the first test result where I forced every EnumerableClass
object's Y
value to be 5, so every item passed Where:
Condition value: 250
Where -> Select: 149 ms
Select -> Where: 115 ms
First Where's first item: 5
First Select's first item: 5
Here is the second test result where I forced every EnumerableClass
object's Y
value to be 251, so no item passed Where:
Condition value: 250
Where -> Select: 110 ms
Select -> Where: 100 ms
First Where's first item: 0
First Select's first item: 0
Clearly, the result is so dependent on the state of the collection that:
- In @YeldarKurmangaliyev's tests .Where().Select() performed better; and,
- In my tests .Select().Where() performed better.
The state of the collection, which I am mentioning over and over includes:
- the size of each item;
- the total number of items in the collection; and,
- the number of items likely to pass the Where clause.
Response to comments on the answer:
Further, @Enigmativity said that knowing ahead of time the result of Where in order to know whether to put Where first or Select first is a Catch-22. Ideally and theoretically, he is correct and not surprisingly, this situation is seen in another domain of Computer Science - Scheduling.
The best scheduling algorithm is Shortest Job First where we schedule that job first that will execute for the least time. But, how would anyone know how much time will a particular job take to complete? Well, the answer is that:
Shortest job next is used in specialized environments where accurate estimates of running time are available.
Therefore, as I said right at the top (which was also the first, shorter version of my answer), the correct answer to this question will depend on the current state of the collection.
In general,
- if your objects are within a reasonable size range; and,
- you are Selecting a very small chunk out of each object; and,
- your collection size is also not just in thousands,
then the guideline mentioned right at the top of this answer will be useful for you.