46

How would one take a List (using LINQ) and break it into a List of Lists partitioning the original list on every 8th entry?

I imagine something like this would involve Skip and/or Take, but I'm still pretty new to LINQ.

Edit: Using C# / .Net 3.5

Edit2: This question is phrased differently than the other "duplicate" question. Although the problems are similar, the answers in this question are superior: Both the "accepted" answer is very solid (with the yield statement) as well as Jon Skeet's suggestion to use MoreLinq (which is not recommended in the "other" question.) Sometimes duplicates are good in that they force a re-examination of a problem.

Pretzel
  • 8,141
  • 16
  • 59
  • 84
  • Are you using VB or C#? The presence of iterators makes a big difference. – Craig Gidney Sep 22 '10 at 20:37
  • 1
    This is not a duplicate. The other question wanted a to break the list into sublists of every n-th element, so a list with elements 0, 8, 16, 24, etc and a list with elements 1, 9, 17, 25, etc. and a list with elements 2, 10, 18, etc. This user wants to break into a list with 0..7 and a list with 8..15 and a list with 16..24, similar to paging – Harald Coppoolse Feb 01 '16 at 16:04

7 Answers7

57

Use the following extension method to break the input into subsets

public static class IEnumerableExtensions
{
    public static IEnumerable<List<T>> InSetsOf<T>(this IEnumerable<T> source, int max)
    {
        List<T> toReturn = new List<T>(max);
        foreach(var item in source)
        {
                toReturn.Add(item);
                if (toReturn.Count == max)
                {
                        yield return toReturn;
                        toReturn = new List<T>(max);
                }
        }
        if (toReturn.Any())
        {
                yield return toReturn;
        }
    }
}
Handcraftsman
  • 6,863
  • 2
  • 40
  • 33
  • I'm going to try this now as this seems quite clever... The thought of "yield return" popped into my head while mulling this over, but I couldn't see a clear way to do it... I'll let you know how this works for me. – Pretzel Sep 22 '10 at 20:46
  • Wow! That's really frickin' cool. I'm going with this! Thanks for the help! :-) – Pretzel Sep 22 '10 at 20:48
46

We have just such a method in MoreLINQ as the Batch method:

// As IEnumerable<IEnumerable<T>>
var items = list.Batch(8);

or

// As IEnumerable<List<T>>
var items = list.Batch(8, seq => seq.ToList());
DarcyThomas
  • 1,218
  • 13
  • 30
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 1
    Cool, this is implemented very nicely, including a resultSelector (important for manipulating/ordering the inner list). – Kirk Woll Sep 22 '10 at 20:43
  • Phew. I thought perhaps I was a bit daft in not being able to figure this out. Good to see that there are some things that are "missing" from regular LINQ-to-Objects. :) – Pretzel Sep 22 '10 at 20:44
  • @Pretzel: It's not that this is impossible using plain old LINQ ... it's just that it's neither terribly efficient or easy to understand. See my answer for a "plain LINQ" example. – LBushkin Sep 22 '10 at 20:48
  • +1, Thanks for the link to this library. I'll see if I can use it in future projects. – Pretzel Sep 22 '10 at 20:56
  • Minor question - I happened to copy the `Batch` code, and Resharper warns me that in `return Batch(source, size, x => x);` the `x` is a possible multiple enumeration issue. Correct/Ignore? – drzaus Jun 11 '15 at 18:16
15

You're better off using a library like MoreLinq, but if you really had to do this using "plain LINQ", you can use GroupBy:

var sequence = new[] {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};

var result = sequence.Select((x, i) => new {Group = i/8, Value = x})
                     .GroupBy(item => item.Group, g => g.Value)
                     .Select(g => g.Where(x => true));

// result is: { {1,2,3,4,5,6,7,8}, {9,10,11,12,13,14,15,16} }

Basically, we use the version of Select() that provides an index for the value being consumed, we divide the index by 8 to identify which group each value belongs to. Then we group the sequence by this grouping key. The last Select just reduces the IGrouping<> down to an IEnumerable<IEnumerable<T>> (and isn't strictly necessary since IGrouping is an IEnumerable).

It's easy enough to turn this into a reusable method by factoring our the constant 8 in the example, and replacing it with a specified parameter. It's not necessarily the most elegant solution, and it is not longer a lazy, streaming solution ... but it does work.

You could also write your own extension method using iterator blocks (yield return) which could give you better performance and use less memory than GroupBy. This is what the Batch() method of MoreLinq does IIRC.

LBushkin
  • 129,300
  • 32
  • 216
  • 265
  • Thanks for your input. Yeah, it doesn't seem efficient and as you can imagine, I was struggling to understand how I could do it with regular LINQ. (I'm staring at your answer right now and I really don't understand it very well.) I'll have to fiddle with it more later. (Thanks again!) – Pretzel Sep 22 '10 at 20:53
  • 2
    The approach using `GroupBy()` breaks down if the sequence you're planning on batching is going to be extremely large (or infinite). As far as how it works - it creates an anonymous object which associates each item with it's index, and then groups this into a series of sequences based on divisibility by `8` (or any other non-zero constant). – LBushkin Sep 22 '10 at 20:56
5

It's not at all what the original Linq designers had in mind, but check out this misuse of GroupBy:

public static IEnumerable<IEnumerable<T>> BatchBy<T>(this IEnumerable<T> items, int batchSize)
{
    var count = 0;
    return items.GroupBy(x => (count++ / batchSize)).ToList();
}

[TestMethod]
public void BatchBy_breaks_a_list_into_chunks()
{
    var values = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
    var batches = values.BatchBy(3);
    batches.Count().ShouldEqual(4);
    batches.First().Count().ShouldEqual(3);
    batches.Last().Count().ShouldEqual(1);
}

I think it wins the "golf" prize for this question. The ToList is very important since you want to make sure the grouping has actually been performed before you try doing anything with the output. If you remove the ToList, you will get some weird side effects.

nawfal
  • 70,104
  • 56
  • 326
  • 368
Mel
  • 2,337
  • 19
  • 25
  • For the record, Handcraftsman's "yield return"-based version performs much better, but I still like the "Hey, you're not supposed to be doing that" aspect of this code. – Mel Oct 14 '10 at 12:10
  • -1 This answer is wrong. If you replace division by modulo operator you get batchSize number of partitions which is more suited for this thread http://stackoverflow.com/questions/438188/split-a-collection-into-n-parts-with-linq – nawfal Dec 06 '12 at 13:57
  • I disagree. If you use modulo, then you're assigning items to the groups by the remainder. You'd get item 0 in group 0, item 1 in group 1, etc. This would totally scramble their order. Using the integer division, as I have done here, means that items 0-2 go in group 0, 3-5 go in group 1, etc. I believe this is what was intended. If you agree, please remove the -1 from my answer. – Mel Dec 06 '12 at 19:17
  • 1
    you're right. I missed the ToList part which actually executes the query before leaving. Why not use a lighter ToArray? I have to edit your answer to remove my -1. +1ed too! :) – nawfal Dec 06 '12 at 19:46
  • ToList is just what I used at the time, but ToArray ought to work just fine. – Mel Dec 07 '12 at 13:53
0

The simplest solution is given by Mel:

public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, 
                                                       int partitionSize)
{
    int i = 0;
    return items.GroupBy(x => i++ / partitionSize).ToArray();
}

Concise but slower. The above method splits an IEnumerable into chunks of desired fixed size with total number of chunks being unimportant. To split an IEnumerable into N number of chunks of equal sizes or close to equal sizes, you could do:

public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items, 
                                                   int numOfParts)
{
    int i = 0;
    return items.GroupBy(x => i++ % numOfParts);
}

To speed up things, a straightforward approach would do:

public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, 
                                                       int partitionSize)
{
    if (partitionSize <= 0)
        throw new ArgumentOutOfRangeException("partitionSize");

    int innerListCounter = 0;
    int numberOfPackets = 0;
    foreach (var item in items)
    {
        innerListCounter++;
        if (innerListCounter == partitionSize)
        {
            yield return items.Skip(numberOfPackets * partitionSize).Take(partitionSize);
            innerListCounter = 0;
            numberOfPackets++;
        }
    }

    if (innerListCounter > 0)
        yield return items.Skip(numberOfPackets * partitionSize);
}

This is faster than anything currently on planet now :) The equivalent methods for a Split operation here

Community
  • 1
  • 1
nawfal
  • 70,104
  • 56
  • 326
  • 368
  • whew, that is much faster! although http://stackoverflow.com/a/4835541/1037948 started edging you out after a couple runs in linqpad... ;) – drzaus Jul 10 '14 at 20:52
  • @drzaus keep in mind that that answer is a dangerous construct with side effects. Since the inner and outer runs on the same enumerator, you get results you do not expect. If you enumerate just the outer sequence, the inner sequences are no longer valid; or you can not iterate the inner sequences twice. Or simply try `var x = returnedResult.ElementAt(n).ToList();`, you get unexpected results. – nawfal Jul 11 '14 at 05:52
  • So I recently tried running some perf comparisons again ([tests+results here](https://gist.github.com/zaus/399ca9081912bd6efd7f#file-f2-results-md)) between just calling this method and doing something with the results, and it wasn't as fast vs some of the other suggestions. Am I missing something? – drzaus Jul 25 '14 at 19:20
  • @drzaus it was wonderful you did that benchmarking. I will see to your link in a week. I need some time now. Thanks :) will let you know. – nawfal Jul 26 '14 at 13:12
  • Recently realized the 'straightforward' solution results in duplicate enumeration/evaluation -- see explanation https://gist.github.com/zaus/4cae1a5b42475a083287 which is probably also pointed out on the "already answered" question – drzaus Jun 10 '15 at 17:16
  • I will see all that. Thanks. Gimme some time. – nawfal Jun 10 '15 at 17:19
0

Take won't be very efficient, because it doesn't remove the entries taken.

why not use a simple loop:

public IEnumerable<IList<T>> Partition<T>(this/* <-- see extension methods*/ IEnumerable<T> src,int num)  
{  
    IEnumerator<T> enu=src.getEnumerator();  
    while(true)  
    {  
        List<T> result=new List<T>(num);  
        for(int i=0;i<num;i++)  
        {  
            if(!enu.MoveNext())  
            {  
                if(i>0)yield return result;  
                yield break;  
            }  
            result.Add(enu.Current);  
        }  
        yield return result;  
    }  
}
Gabe
  • 84,912
  • 12
  • 139
  • 238
Zotta
  • 2,513
  • 1
  • 21
  • 27
0
from b in Enumerable.Range(0,8) select items.Where((x,i) => (i % 8) == b);
James Dunne
  • 3,607
  • 3
  • 24
  • 29
  • +1 But note this groups every 8th item, i.e. you get {0,8,16}, {1,9,17}, ... – Ruben Bartelink Nov 13 '12 at 14:56
  • This is more about splitting than partitioning - more suitable [here](http://stackoverflow.com/questions/438188/split-a-collection-into-n-parts-with-linq) – nawfal Dec 06 '12 at 14:20
  • Furthermore this might give empty inner IEnumerables if number 8 is more than the number of items in `items` which may or may not be desirable. I incorporated your answer in the other thread anyway.. – nawfal Dec 06 '12 at 14:26