1

I'm trying to split a collection into an specific number of parts, I've taken some help seeying solutions over StackOverflow: Split a collection into `n` parts with LINQ?

This is my VB.Net translation from @Hasan Khan solution:

''' <summary>
''' Splits an <see cref="IEnumerable(Of T)"/> into the specified amount of secuences.
''' </summary>
Public Shared Function SplitIntoParts(Of T)(ByVal col As IEnumerable(Of T),
                                            ByVal amount As Integer) As IEnumerable(Of IEnumerable(Of T))

    Dim i As Integer = 0

    Dim splits As IEnumerable(Of IEnumerable(Of T)) =
                 From item As T In col
                 Group item By item = Threading.Interlocked.Increment(i) Mod amount
                 Into Group
                 Select Group.AsEnumerable()

    Return splits


End Function

And this my VB.Net translation of @manu08 solution:

''' <summary>
''' Splits an <see cref="IEnumerable(Of T)"/> into the specified amount of secuences.
''' </summary>
Public Shared Function SplitIntoParts(Of T)(ByVal col As IEnumerable(Of T),
                                            ByVal amount As Integer) As IEnumerable(Of IEnumerable(Of T))

    Return col.Select(Function(item, index) New With {index, item}).
               GroupBy(Function(x) x.index Mod amount).
               Select(Function(x) x.Select(Function(y) y.item))

End Function

The problem is that both functions returns a wrong result.

Because if I split a collection like this:

Dim mainCol As IEnumerable(Of Integer) = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

Dim splittedCols As IEnumerable(Of IEnumerable(Of Integer)) =
    SplitIntoParts(col:=mainCol, amount:=2)

Both functions gives this result:

1: { 1, 3, 5, 7, 9 }
2: { 2, 4, 6, 8, 10 }

Instead of these secuences:

1: { 1, 2, 3, 4, 5 } 
2: { 6, 7, 8, 9, 10 }

What I'm doing wrong?.

Community
  • 1
  • 1
ElektroStudios
  • 19,105
  • 33
  • 200
  • 417
  • 1
    I am not a LINQ expert.. but `mod` is wrong here. You'll want to group by some kind of count. – Sam Axe Apr 01 '15 at 10:34
  • http://stackoverflow.com/questions/5215469/use-linq-to-break-up-listt-into-lots-of-listt-of-n-length - in particular the `Partition()` method – Ric Apr 01 '15 at 10:52
  • @Ric thanks, I've tried it but that is for split the collection into a number of elements per secuence, I'm trying to split a collection into number of secuences. – ElektroStudios Apr 01 '15 at 10:56
  • @Sam Axe thanks for comment, but the C# % operator in VB.Net is equivalent to Mod, I'm not right? – ElektroStudios Apr 01 '15 at 11:00
  • 1
    Yes, `Mod` is the equivalent of `%`. But if you want a sequence, `Mod` is the wrong operation to be performing. – Sam Axe Apr 01 '15 at 11:02

3 Answers3

3

MyExtensions class has two public Split methods:

  1. For ICollection - iterates through collection only once - for splitting.
  2. For IEnumerable - iterates through enumerable twice: for count items and for split them. Do not use this if possible (first one is safe and twice faster).

More of that: this algorithm is trying to bring back exactly specified number of collections.

public static class MyExtensions
{
    // Works with ICollection - iterates through collection only once.
    public static IEnumerable<IEnumerable<T>> Split<T>(this ICollection<T> items, int count)
    {
        return Split(items, items.Count, count);
    }

    // Works with IEnumerable and iterates items TWICE: first for count items, second to split them.
    public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items, int count)
    {            
        // ReSharper disable PossibleMultipleEnumeration
        var itemsCount = items.Count();
        return Split(items, itemsCount, count);
        // ReSharper restore PossibleMultipleEnumeration
    }

    private static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items, int itemsCount, int partsCount)
    {
        if (items == null)
            throw new ArgumentNullException("items");
        if (partsCount <= 0)
            throw new ArgumentOutOfRangeException("partsCount");

        var rem = itemsCount % partsCount;
        var min = itemsCount / partsCount;
        var max = rem != 0 ? min + 1 : min;

        var index = 0;
        var enumerator = items.GetEnumerator();

        while (index < itemsCount)
        {
            var size = 0 < rem-- ? max : min;
            yield return SplitPart(enumerator, size);
            index += size;
        }
    }

    private static IEnumerable<T> SplitPart<T>(IEnumerator<T> enumerator, int count)
    {
        for (var i = 0; i < count; i++)
        {
            if (!enumerator.MoveNext())
                break;
            yield return enumerator.Current;
        }            
    }
}

Example program:

var items = new [] {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'};

for(var i = 1; i <= items.Length + 3; i++)
{
    Console.WriteLine("{0} part(s)", i);
    foreach (var part in items.Split(i))
        Console.WriteLine(string.Join(", ", part));
    Console.WriteLine();
}

... and output of this program:

1 part(s)
a, b, c, d, e, f, g, h, i, j

2 part(s)
a, b, c, d, e
f, g, h, i, j

3 part(s)
a, b, c, d
e, f, g
h, i, j

4 part(s)
a, b, c
d, e, f
g, h
i, j

5 part(s)
a, b
c, d
e, f
g, h
i, j

6 part(s)
a, b
c, d
e, f
g, h
i
j

7 part(s)
a, b
c, d
e, f
g
h
i
j

8 part(s)
a, b
c, d
e
f
g
h
i
j

9 part(s)
a, b
c
d
e
f
g
h
i
j

10 part(s)
a
b
c
d
e
f
g
h
i
j

11 part(s) // Only 10 items in collection.
a
b
c
d
e
f
g
h
i
j

12 part(s) // Only 10 items in collection.
a
b
c
d
e
f
g
h
i
j

13 part(s)  // Only 10 items in collection.
a
b
c
d
e
f
g
h
i
j
rtf_leg
  • 1,789
  • 1
  • 15
  • 27
  • its amazing, thanks, I actually was performing an error-handling the "amount" value and throwing an exception when the value was not a multiplier of 'TheCollection.Count' to avoid generating empty secuences with @Szer's solution, your solution takes in consideration that feature. – ElektroStudios Apr 01 '15 at 18:59
1

You're not doing something wrong; it's just that the methods you use don't keep the ordering the way you want. Think about how mod and GroupBy work and you see why.

I suggest you use Jon Skeet's answer, since it keeps the ordering of your collection (I took the liberty to translate it for you into VB.Net).

You just have to calculate the size of each partition beforehand, since it does not split a collection into n chunks, but into chunks of a length of n:

<Extension> _
Public Shared Iterator Function Partition(Of T)(source As IEnumerable(Of T), size As Integer) As IEnumerable(Of IEnumerable(Of T)) 
    Dim array__1 As T() = Nothing
    Dim count As Integer = 0
    For Each item As T In source
        If array__1 Is Nothing Then
            array__1 = New T(size - 1) {}
        End If
        array__1(count) = item
        count += 1
        If count = size Then
            yield New ReadOnlyCollection(Of T)(array__1)
            array__1 = Nothing
            count = 0
        End If
    Next
    If array__1 IsNot Nothing Then
        Array.Resize(array__1, count)
        yield New ReadOnlyCollection(Of T)(array__1)
    End If
End Function

And to use it:

mainCol.Partition(CInt(Math.Ceiling(mainCol.Count() / 2)))

Feel free to hide the Partition(CInt(Math.Ceiling(...)) part in a new method.

Community
  • 1
  • 1
sloth
  • 99,095
  • 21
  • 171
  • 219
  • Thanks for explaining about the ordering and the code – ElektroStudios Apr 01 '15 at 11:18
  • But your function splits into a number of elements per collection, not into a number of collections, at least that is what my VB translation does, I'm wrong? – ElektroStudios Apr 01 '15 at 11:23
  • 1
    @ElektroStudios Yes, the function splits into a number of elements per collection, as I said in my answer; hence you have to calculate the number elements in each collection, as I also said in my answer: `CInt(Math.Ceiling(mainCol.Count() / amount))`, which is exactly what Szer does in his answer, expect that the function in my solution is more efficient, since it does not iterate multiple times over the source enumerable. – sloth Apr 01 '15 at 11:37
1

Inefficient solution (too many iterations over data):

class Program
{
    static void Main(string[] args)
    {
        var data = Enumerable.Range(1, 10);
        var result = data.Split(2);            
    }
}

static class Extensions
{
    public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> col, int amount)
    {
        var chunkSize = (int)Math.Ceiling((double)col.Count() / (double)amount);

        for (var i = 0; i < amount; ++i)
            yield return col.Skip(chunkSize * i).Take(chunkSize);
    }
}

EDIT:

In VB.Net

Public Shared Iterator Function SplitIntoParts(Of T)(ByVal col As IEnumerable(Of T),
                                                     ByVal amount As Integer) As IEnumerable(Of IEnumerable(Of T))

    Dim chunkSize As Integer = CInt(Math.Ceiling(CDbl(col.Count()) / CDbl(amount)))

    For i As Integer = 0 To amount - 1
        Yield col.Skip(chunkSize * i).Take(chunkSize)
    Next

End Function
Szer
  • 3,426
  • 3
  • 16
  • 36
  • I'm having problems with the function, seems to be unperfect in some way, If I try to split a collection of 10 items into 10 secuences it should produce 10 collections of 1 element each, but dont. I'm investigating why – ElektroStudios Apr 01 '15 at 11:21
  • 1
    This method iterates *IEumerable col* **(1 + amount) times**. First time to count itmes, next times in each iteration of for loop. It is working, but it is not a effective solution. First of all: it is better to split ICollection, not IEnumerable (ICollection has Count). Second: it is possible not to call Skip for every chunk. – rtf_leg Apr 01 '15 at 11:23
  • @ElektroStudios I test it myself, everything is fine with your new example – Szer Apr 01 '15 at 11:23
  • @rtf_leg yes, you are right. Added disclaimer to my answer – Szer Apr 01 '15 at 11:30