3

In several of my most recent projects, I've found the need to divide a single collection up into m batches of n elements.

There is an answer to this question that suggests using morelinq's Batch method. That is my preferred solution (no need to re-invent the wheel and all that).

While the Batch method divides the input up in row-major format, would it also be possible to write a similar extension that divides the input up in column-major format? That is, given the input

{ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 }

you would call ColumnBatch(4), generating an output of

{
    { 1, 4, 7, 10 },
    { 2, 5, 8, 11 },
    { 3, 6, 9, 12 }
}

Does morelinq already offer something like this?

UPDATE: I'm going to change the convention slightly. Instead of using an input of 4, I'll change it to 3 (the number of batches rather than the batch size).

The final extension method will have the signature

public static IEnumerable<IEnumerable<T>> ToColumns<T>(this IEnumerable<T> source, int numberOfColumns)

which should be clear to whoever is implementing the method, and does not require multiple enumerations.

Community
  • 1
  • 1
Rob Lyndon
  • 12,089
  • 5
  • 49
  • 74
  • This would mean you'd have to either materialize the whole enumeration or enumerate it multiple times. Either way, it seems like a strange thing to do with an enumeration. – nvoigt Feb 12 '14 at 15:29
  • It's not that strange. Fortran is built on column major format. – Rob Lyndon Feb 12 '14 at 15:31
  • I doubt Fortran has any notion of .NET IEnumerable, yield and deferred execution. It would work well on any materialized type like array, but on a basic IEnumerable, it would be strange, because it would enumerate multiple times and/or materialize it. – nvoigt Feb 12 '14 at 15:34
  • Maybe if I change the convention so that an input of 3 would generate the same output, that would avoid the multiple enumeration. – Rob Lyndon Feb 12 '14 at 15:45
  • `which should be clear to whoever is implementing the method, and does not require multiple enumerations.` It is *impossible* to solve the problem without either iterating the source multiple times, or materializing the entire query into a single collection before yielding any items (so that that collection can be iterated multiple times). – Servy Feb 12 '14 at 15:53

1 Answers1

4
int[] arr = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 };
int i=0;
var result = arr.GroupBy(x => i++ % 3).Select(g => g.ToList()).ToList();
L.B
  • 114,136
  • 19
  • 178
  • 224
  • Now add 13 to the end of the array and watch this break. That's why you shouldn't use magic numbers. – Servy Feb 12 '14 at 15:36
  • +1, I like this solution more, expect closure but it is ok in this case – Arsen Mkrtchyan Feb 12 '14 at 15:37
  • @L.B Then add 13, 14, 15, 16 and *still* watch it break. A hard coded `3` here is wrong. – Servy Feb 12 '14 at 15:38
  • @Servy I think OP can get what is meant in that answer. – L.B Feb 12 '14 at 15:39
  • So apparently you don't care that it doesn't solve the problem, so long as it works for the single example provided. That it doesn't work for any other is irrelevant. – Servy Feb 12 '14 at 15:41
  • Still, that is the point nvoigt is making. In the form I put the question, you'd call it with an input of 4, and to get the 3 you'd need to divide the size of the array by the input; hence the multiple enumeration. – Rob Lyndon Feb 12 '14 at 15:42
  • So you care about solving the problem, but you don't care about a comment that demonstrates that you have a bug in your solution to the problem. – Servy Feb 12 '14 at 15:43
  • @Servy What do you mean by "break" in your comments? – Jon Senchyna Feb 12 '14 at 15:59
  • @JonSenchyna Return improper results. This code says, "create 3 batches of size N" while the requirements state "create M batches of size 4". If there are 9-12 items in the collection this just *coincidentally* works, because 4 columns will result in 3 batches for those values. For a collection of any other size it is wrong. – Servy Feb 12 '14 at 16:02
  • @Servy The question was changed to create N batches. Due to that, this code create the appropriate results (assuming N is 3). From there, it's simply a matter of replacing 3 with a variable. – Jon Senchyna Feb 12 '14 at 16:05