32

I have an IEnumerable<string> which I would like to split into groups of three so if my input had 6 items i would get a IEnumerable<IEnumerable<string>> returned with two items each of which would contain an IEnumerable<string> which my string contents in it.

I am looking for how to do this with Linq rather than a simple for loop

Thanks

Don Kirkby
  • 53,582
  • 27
  • 205
  • 286
Kev Hunter
  • 2,565
  • 4
  • 25
  • 39

8 Answers8

34

This is a late reply to this thread, but here is a method that doesn't use any temporary storage:

public static class EnumerableExt
{
    public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> input, int blockSize)
    {
        var enumerator = input.GetEnumerator();

        while (enumerator.MoveNext())
        {
            yield return nextPartition(enumerator, blockSize);
        }
    }

    private static IEnumerable<T> nextPartition<T>(IEnumerator<T> enumerator, int blockSize)
    {
        do
        {
            yield return enumerator.Current;
        }
        while (--blockSize > 0 && enumerator.MoveNext());
    }
}

And some test code:

class Program
{
    static void Main(string[] args)
    {
        var someNumbers = Enumerable.Range(0, 10000);

        foreach (var block in someNumbers.Partition(100))
        {
            Console.WriteLine("\nStart of block.");

            foreach (int number in block)
            {
                Console.Write(number);
                Console.Write(" ");
            }
        }

        Console.WriteLine("\nDone.");
        Console.ReadLine();
    }
}

However, do note the comments below for the limitations of this approach:

  1. If you change the foreach in the test code to foreach (var block in someNumbers.Partition(100).ToArray()) then it doesn't work any more.

  2. It isn't threadsafe.

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • +1 for preserving the enumerability of the original. Just the solution I was looking for to work with a `BlockingCollection`. – Tim Rogers Sep 20 '12 at 09:45
  • you deserve +2 for including test code. not as concise as some other examples, but most accurate in terms of keeping to ienumerables and limiting iterations over the collection. well done! – mtazva Sep 21 '12 at 21:14
  • 4
    NOTE: It is critical to understand that this code is NOT thread safe! You'd have to transform the resulting ienumerables to a concrete type, synchronously, prior to passing the results to async code to ensure the items are properly collected in batches. – mtazva Sep 24 '12 at 19:27
  • By far the best solution. Most answers just iterate over the collection and store the results in a List, or worse, iterate multiple times over the same items! – bouvierr Mar 27 '14 at 21:24
  • 1
    A problem with this code is, that it the enumerator is passed by reference, so it does break down you iterate width-first: If you change one unit test line to foreach (var block in someNumbers.Partition(100).ToArray()) it all breaks down. – realbart Jul 22 '16 at 11:57
  • As @realbart said, this code is flawed. Passing the same enumerator to nextPartition is not safe at all. When evaluated lazily (as IEnumerable should be), the enumerator will get exhausted at unpredictable times. – markonius Mar 22 '20 at 22:50
33
var result = sequence.Select((s, i) => new { Value = s, Index = i })
                     .GroupBy(item => item.Index / 3, item => item.Value);

Note that this will return an IEnumerable<IGrouping<int,string>> which will be functionally similar to what you want. However, if you strictly need to type it as IEnumerable<IEnumerable<string>> (to pass to a method that expects it in C# 3.0 which doesn't support generics variance,) you should use Enumerable.Cast:

var result = sequence.Select((s, i) => new { Value = s, Index = i })
                     .GroupBy(item => item.Index / 3, item => item.Value)
                     .Cast<IEnumerable<string>>();
Mehrdad Afshari
  • 414,610
  • 91
  • 852
  • 789
  • 1
    That was unbelivably quick, thanks – Kev Hunter Aug 28 '09 at 21:36
  • 2
    Does the GroupBy have to iterate the whole sequence before you get any results, or do you still get deferred execution here? – Don Kirkby Nov 14 '09 at 00:27
  • @Don Kirkby: For LINQ to Objects, `.GroupBy` doesn't enumerate the sequence. It enumerates the whole sequence as soon as `.GetEnumerator` is called on it (e.g. when used in `foreach` or something). – Mehrdad Afshari Nov 14 '09 at 06:19
  • 2
    Right @Don, GroupBy's evaluation isn't as lazy as the other Linq methods. It enumerates all the sequence before returning any groups. – Colonel Panic Jul 11 '12 at 22:34
  • 1
    I think the cast is not required. `IGrouping` inherits `IEnumerable` and result can be declared as `IEnumerable>` – Anupam Jan 30 '13 at 02:37
22

I know this has already been answered, but if you plan on taking slices of IEnumerables often, then I recommend making a generic extension method like this:

public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> source, int chunkSize)
{
    return source.Where((x,i) => i % chunkSize == 0).Select((x,i) => source.Skip(i * chunkSize).Take(chunkSize));
}

Then you can use sequence.Split(3) to get what you want.

(you can name it something else like 'slice', or 'chunk' if you don't like that 'split' has already been defined for strings. 'Split' is just what I happened to call mine.)

brunosp86
  • 670
  • 1
  • 10
  • 21
diceguyd30
  • 2,742
  • 20
  • 18
  • +1. I like the fact that you're able to achieve the same result as I did with one line of code. – Alex Essilfie Oct 14 '10 at 16:53
  • It's been a while and I've been using your code (solution/answer/whatever you call it) for a while and it works perfectly. I recently tried to analyse your code and I could not grasp the `.Where((x,i) => i % chunkSize == 0)` part of your code, it still works okay. If you don't mind, could you explain how your code works to me? Thanks. – Alex Essilfie May 05 '11 at 18:56
  • 2
    @Alex Certainly! Say your collection is 9 items in length and you want to split it into groups of 3. All that expression really does is figure out how many groups are going to be made. As you can see, I'm only really interested in the indices in the `Where` and the `Select`. I go from having indices '0-8' in the `Where` to having '0-2' in the `Select` since the `Where` clause will only return 3 out of the 9 items (check result of `Enumerable.Range(0,9).Select((x,i) => i % 3)` for proof!). So I first skip 0 (0 * 3) and take 3, then skip 3 (1 * 3) and take 3 then skip 6 (2 * 3) and take 3! – diceguyd30 May 05 '11 at 20:08
  • 6
    Only problem with solution is that it will iterate over the source n+1 times where n is the number of chunks. This is both problematic from a performance perspective and dealing with sources that can't be re-enumerated. – Arne Claassen Feb 03 '12 at 21:58
  • @ArneClaassen You are absolutely correct. The best version of a chunk method would use a for loop with a yield retu... *scrolls down* ...I see you already posted one. ^.^ I'm not going to lie, I use the above method only because of how concise it is. I'm a sucker for one-liners. :P – diceguyd30 Feb 06 '12 at 13:56
16

Inspired By @dicegiuy30's implementation, I wanted to create a version that only iterates over the source once and doesn't build the whole result set in memory to compensate. Best I've come up with is this:

public static IEnumerable<IEnumerable<T>> Split2<T>(this IEnumerable<T> source, int chunkSize) {
    var chunk = new List<T>(chunkSize);
    foreach(var x in source) {
        chunk.Add(x);
        if(chunk.Count <= chunkSize) {
            continue;
        }
        yield return chunk;
        chunk = new List<T>(chunkSize);
    }
    if(chunk.Any()) {
        yield return chunk;
    }
}

This way I build each chunk on demand. I wish I should avoid the List<T> as well and just stream that that as well, but haven't figured that out yet.

Arne Claassen
  • 14,088
  • 5
  • 67
  • 106
  • 1
    +1 a great implementation. Pretty much exactly as Jon Skeet did it: http://code.google.com/p/morelinq/source/browse/trunk/MoreLinq/Batch.cs – diceguyd30 Feb 06 '12 at 14:05
  • 3
    +1 this seems to be very efficient, but I think it has a bug in the following line: `if(chunk.Count <= chunkSize)` The correct line is as follows: `if(chunk.Count < chunkSize)` – Arne Lund May 23 '12 at 19:28
2

using Microsoft.Reactive you can do this pretty simply and you will iterate only one time through the source.

IEnumerable<string> source = new List<string>{"1", "2", "3", "4", "5", "6"};

IEnumerable<IEnumerable<string>> splited = source.ToObservable().Buffer(3).ToEnumerable();
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
BBeau
  • 51
  • 3
  • 1
    You don't need to use ToObservable and then back to ToEnumerable, you can use the Interactive Extensions Buffer method that works with Enumerable. Look up Ix-Main on nuget. – Niall Connaughton Oct 23 '13 at 04:10
  • If using `Reactive` you NEED ToObservable. Otherwise you have to use `Interactive` – Emaborsa Dec 22 '16 at 09:18
2

We can improve @Afshari's solution to do true lazy evaluation. We use a GroupAdjacentBy method that yields groups of consecutive elements with the same key:

sequence
.Select((x, i) => new { Value = x, Index = i })
.GroupAdjacentBy(x=>x.Index/3)
.Select(g=>g.Select(x=>x.Value))

Because the groups are yielded one-by-one, this solution works efficiently with long or infinite sequences.

Colonel Panic
  • 132,665
  • 89
  • 401
  • 465
1

Mehrdad Afshari's answer is excellent. Here is the an extension method that encapsulates it:

using System.Collections.Generic;
using System.Linq;

public static class EnumerableExtensions
{
    public static IEnumerable<IEnumerable<T>> GroupsOf<T>(this IEnumerable<T> enumerable, int size)
    {
        return enumerable.Select((v, i) => new {v, i}).GroupBy(x => x.i/size, x => x.v);
    }
}
Ronnie Overby
  • 45,287
  • 73
  • 267
  • 346
0

I came up with a different approach. It uses a while iterator alright but the results are cached in memory like a regular LINQ until needed.
Here's the code.

public IEnumerable<IEnumerable<T>> Paginate<T>(this IEnumerable<T> source, int pageSize)
{
    List<IEnumerable<T>> pages = new List<IEnumerable<T>>();
    int skipCount = 0;

    while (skipCount * pageSize < source.Count) {
        pages.Add(source.Skip(skipCount * pageSize).Take(pageSize));
        skipCount += 1;
    }

    return pages;
}
Alex Essilfie
  • 12,339
  • 9
  • 70
  • 108
  • 1
    Never saw `Runtime.CompilerServices.Extension` before so I looked it up and the MSDN says "In C#, you do not need to use this attribute; you should use the `this` modifier for the first parameter to create an extension method." So in other words while it's functionally equivilent, it's preferred to use `public IEnumerable> Paginate(this IEnumerable source, int pageSize)` rather than the attribute. – Davy8 Jul 05 '11 at 20:54
  • @Davy8: I was just learning C# around the time I submitted this answer so the code is a direct port from VB.NET. I've now gained some mastery over C# and I know that this code doesn't isn't really 'correct'. I've updated the answer now. – Alex Essilfie Jul 06 '11 at 02:54
  • how did you get `source.Count`? its an ienumerable, and doing `Count()` on it enumerates the collection once. – nawfal Feb 18 '13 at 14:06