8

I'm having a problem knowing the best way to make a method to group a list of items into groups of (for example) no more than 3 items. I've created the method below, but without doing a ToList on the group before I return it, I have a problem with it if the list is enumerated multiple times.

The first time it's enumerated is correct, but any additional enumeration is thrown off because the two variables (i and groupKey) appear to be remembered between the iterations.

So the questions are:

  • Is there a better way to do what I'm trying to achieve?
  • Is simply ToListing the resulting group before it leaves this method really such a bad idea?

    public static IEnumerable<IGrouping<int, TSource>> GroupBy<TSource>
                  (this IEnumerable<TSource> source, int itemsPerGroup)
    {
        const int initial = 1;
        int i = initial;
        int groupKey = 0;
    
        var groups = source.GroupBy(x =>
        {
            if (i == initial)
            {
                groupKey = 0;
            }
    
            if (i > initial)
            {
                //Increase the group key if we've counted past the items per group
                if (itemsPerGroup == initial || i % itemsPerGroup == 1)
                {
                    groupKey++;
                }
            }
    
            i++;
    
            return groupKey;
        });
    
        return groups;
    }
    
Venson
  • 1,772
  • 17
  • 37
ajbrun
  • 161
  • 2
  • 13

5 Answers5

17

Here's one way to do this using LINQ...

public static IEnumerable<IGrouping<int, TSource>> GroupBy<TSource>
    (this IEnumerable<TSource> source, int itemsPerGroup)
{
    return source.Zip(Enumerable.Range(0, source.Count()),
                      (s, r) => new { Group = r / itemsPerGroup, Item = s })
                 .GroupBy(i => i.Group, g => g.Item)
                 .ToList();
}

Live Demo

Anthony Chu
  • 37,170
  • 10
  • 81
  • 71
  • Thanks Anthony, I've been experimenting with this solution as well, as it keeps the IGrouping. The only change I've made to it is to remove the .ToList at the end. Is this necessary? Admittedly, without this line, I get the error "**Results View = The type '<>f__AnonymousType0' exists in both 'Microsoft.VisualStudio.TestPlatform.Extensions.VSTestIntegration.dll' and 'MyAssembly.Core.dll'**" when displaying the results of the enumeration of each group when debugging. I've tried to resolve this without having the ToList, but haven't been successful. Any ideas? – ajbrun May 29 '14 at 09:08
10

I think you are looking for something like this:

return source.Select((x, idx) => new { x, idx })
      .GroupBy(x => x.idx / itemsPerGroup)
      .Select(g => g.Select(a => a.x));

You need to change your return type as IEnumerable<IEnumerable<TSource>>

Selman Genç
  • 100,147
  • 13
  • 119
  • 184
  • Unless there's any reason to take another solution, I'll take this as the answer. I've very roughly benchmarked this one and Anthony's on .NET fiddle, but this seems a little faster. – ajbrun May 28 '14 at 21:25
  • this is actually a good solution - though not as straight as Anthony's. if possible you should describe how this sorcery is done.. i'm intrigued especially on the `.GroupBy(x => x.idx / itemsPerGroup)` behavior. it utilise the fact that the computation will be rounded down and result in several items have the same 'bucket'. terrifically efficient. – Bagus Tesa Jan 08 '18 at 16:17
4

The problem with using GroupBy() is that unless it somehow has knowledge under the hood that the input is ordered by key value, it has to read the entire sequence and allocate everything to its bucket before it can emit a single group. That's overkill in this case, since the key is a function of its ordinal position within the sequence.

I like the source.Skip(m).Take(n) approach, but that makes assumptions that items in source can be directly addressed. If that's not true or Skip() and Take() have no knowledge of the underlying implementation, then the production of each group is going to be an O(n/2) operation on the average, as it repeatedly iterates over source to produce the group.

This makes the overall partitioning operation, potentially quite expensive.

  • IF producing a group is an O(n/2) operation on the average, and
  • Given a group size of s, the production of approximately n/s groups is required,

Then the total cost of the operation is something like O(n2/2s), right?

So, I would do something this, an O(n) operation (feel free to use an IGrouping implementation if you'd like):

public static IEnumerable<KeyValuePair<int,T[]>> Partition<T>( this IEnumerable<T> source , int partitionSize )
{
  if ( source        == null ) throw new ArgumentNullException("source") ;
  if ( partitionSize <  1    ) throw new ArgumentOutOfRangeException("partitionSize") ;

  int     i         = 0 ;
  List<T> partition = new List<T>( partitionSize ) ;

  foreach( T item in source )
  {
    partition.Add(item) ;
    if ( partition.Count == partitionSize )
    {
      yield return new KeyValuePair<int,T[]>( ++i , partition.ToArray() ) ;
      partition.Clear() ;
    }
  }

  // return the last partition if necessary
  if ( partition.Count > 0 )
  {
    yield return new Partition<int,T>( ++i , items.ToArray() ) ;
  }

}
Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
3

.net Fiddle

Essentially you have an IEnumerable, and you want to group it into an IEnumerable of IGroupables which each contain the key as an index and the group as the values. Your version does seem to accomplish on the first pass, but I think that you can definitely stream line a little bit.

Using skip and take is the most desirable way to accomplish in my opinion, but the custom key for grouping is where there is an issue. There is a way around this which is to create your own class as a grouping template (seen in this answer: https://stackoverflow.com/a/5073144/1026459).

The end result is this:

public static class GroupExtension
{
    public static IEnumerable<IGrouping<int, T>> GroupAt<T>(this IEnumerable<T> source, int itemsPerGroup)
    {
        for(int i = 0; i < (int)Math.Ceiling( (double)source.Count() / itemsPerGroup ); i++)
        {
            var currentGroup = new Grouping<int,T>{ Key = i };
            currentGroup.AddRange(source.Skip(itemsPerGroup*i).Take(itemsPerGroup));
            yield return currentGroup;
        }
    }
    private class Grouping<TKey, TElement> : List<TElement>, IGrouping<TKey, TElement>
    {
        public TKey Key { get; set; }
    }
}

And here is the demo in the fiddle which consumes it on a simple string

public class Program
{
    public void Main(){
        foreach(var p in getLine().Select(s => s).GroupAt(3))
            Console.WriteLine(p.Aggregate("",(s,val) => s += val));
    }
    public string getLine(){ return "Hello World, how are you doing, this just some text to show how the grouping works"; }
}

edit

Alternatively as just an IEnumerable of IEnumerable

public static IEnumerable<IEnumerable<T>> GroupAt<T>(this IEnumerable<T> source, int itemsPerGroup)
{
    for(int i = 0; i < (int)Math.Ceiling( (double)source.Count() / itemsPerGroup ); i++)
        yield return source.Skip(itemsPerGroup*i).Take(itemsPerGroup);
}
Community
  • 1
  • 1
Travis J
  • 81,153
  • 41
  • 202
  • 273
0

This is based on Selman's Select with index idea, but using ToLookup to combine both the GroupBy and Select together as one:

public static IEnumerable<IEnumerable<TSource>> GroupBy<TSource>
        (this IEnumerable<TSource> source, int itemsPerGroup)
{    
    return source.Select((x, idx) => new { x, idx })
            .ToLookup(q => q.idx / itemsPerGroup, q => q.x);
}

The main difference though is that ToLookup actually evaluates results immediately (as concisely explained here: https://stackoverflow.com/a/11969517/7270462), which may or may not be desired.

ErrCode
  • 186
  • 2
  • 11