As already posted you can use a for
loop and Skip
some elements and Take
some elements. In this way you create a new query in every for
loop. But a problem raises if you also want to go through each of those queries, because this will be very inefficient. Lets assume you just have 50 entries and you want to go through your list with ten elements every loop. You will have 5 loops doing
- .Skip(0).Take(10)
- .Skip(10).Take(10)
- .Skip(20).Take(10)
- .Skip(30).Take(10)
- .Skip(40).Take(10)
Here two problem raises.
Skip
ing elements can still lead to computation. In your first query you just calculate the needed 10 elements, but in your second loop you calculated 20 elements and throwing 10 away, and so on. If you sum all 5 loops together you already computed 10 + 20 + 30 + 40 + 50 = 150 elements even you only had 50 elements. This result in an O(n^2) performance.
- Not every IEnumerable does the above thing. Some IEnumerable like a database for example can optimize a
Skip
, for example they use an Offset
(MySQL) definition in the SQL query. But that still doesn't solve the problem. The main problem you still have is that you will create 5 different Queries and execute all 5 of them. Those five queries will now take the most time. Because a simple Query to a database is even a lot slower than just Skipping some in-memory elements or some computations.
Because of all these problems it makes sense to not use a for
loop with multiple .Skip(x).Take(y)
if you also want to evaluate every query in every loop. Instead your algorithm should only go through your IEnumerable once, executing the query once, and on the first iteration return the first 10 elements. The next iteration returns the next 10 elements and so on, until it runs out of elements.
The following Extension Method does exactly this.
public static IEnumerable<IReadOnlyList<T>> Combine<T>(this IEnumerable<T> source, int amount) {
var combined = new List<T>();
var counter = 0;
foreach ( var entry in source ) {
combined.Add(entry);
if ( ++counter >= amount ) {
yield return combined;
combined = new List<T>();
counter = 0;
}
}
if ( combined.Count > 0 )
yield return combined;
}
With this you can just do
someEnumerable.Combine(100)
and you get a new IEnumerable<IReadOnlyList<T>>
that goes through your enumeration just once slicing everything into chunks with a maximum of 100 elements.
Just to show how much difference the performance could be:
var numberCount = 100000;
var combineCount = 100;
var nums = Enumerable.Range(1, numberCount);
var count = 0;
// Bechmark with Combine() Extension
var swCombine = Stopwatch.StartNew();
var sumCombine = 0L;
var pages = nums.Combine(combineCount);
foreach ( var page in pages ) {
sumCombine += page.Sum();
count++;
}
swCombine.Stop();
Console.WriteLine("Count: {0} Sum: {1} Time Combine: {2}", count, sumCombine, swCombine.Elapsed);
// Doing it with .Skip(x).Take(y)
var swTakes = Stopwatch.StartNew();
count = 0;
var sumTaken = 0L;
var alreadyTaken = 0;
while ( alreadyTaken < numberCount ) {
sumTaken += nums.Skip(alreadyTaken).Take(combineCount).Sum();
alreadyTaken += combineCount;
count++;
}
swTakes.Stop();
Console.WriteLine("Count: {0} Sum: {1} Time Takes: {2}", count, sumTaken, swTakes.Elapsed);
The usage with the Combine() Extension Methods runs in 3 milliseconds
on my computer (i5 @ 4Ghz) while the for
loop already needs 178 milliseconds
If you have a lot more elements or the slicing is smaller it gets even more worse. For example if combineCount
is set to 10
instead of 100
the runtime changes to 4 milliseconds
and 1800 milliseconds (1.8 seconds)
Now you could possibly say that you don't have so much elements or your slicing never gets so small. But remember, in this this example i just generated a sequence of numbers that has nearly zero computation time. The whole overhead from 4 milliseconds
to 178 milliseconds
is only caused of the re-evaluation and Skip
ing of values. If you have some more complex stuff going on behind the scenes the Skipping creates the most overhead, and also if an IEnumerable can implement Skip
, like a database as explained above, that example will still get more worse, because the most overhead will be the execution of the query itself.
And the amount of queries can go really fast up. With 100.000 elements and a slicing/chunking of 100 you already will execute 1.000 queries. The Combine
Extension provided above on the other hand will always execute your query once. And will never suffer of any of those problems described above.
All of that doesn't mean that Skip
and Take
should be avoided. They have their place. But if you really plan to go through every element you should avoid using Skip
and Take
to get your slicing done.
If the only thing you want is just to slice everything into pages with 100 elements, and you just want to fetch the third page, for example. You just should calculate how much elements you need to Skip.
var pageCount = 100;
var pageNumberToGet = 3;
var thirdPage = yourEnumerable.Skip(pageCount * (pageNumberToGet-1)).take(pageCount);
In this way you will get the elements from 200
to 300
in a single query. Also an IEnumerable with a databse can optimize that and you just have a single-query. So, if you only want a specific range of elements from your IEnumerable
than you should use Skip
and Take
and do it like above instead of using the Combine
Extension Method that i provided.