1

I really like the algorithm shown below for splitting a list into sublists of a fixed size. It might not be the most an efficient algorithm (edit: at all).

I'd like something that has a good balance of readability, elegance, and performance. The problem is, most algorithms I find in C# require the yield keyword, which is not available if you're using .NET 3.5 in Visual Studio 2010 ;)

public IEnumerable<IEnumerable<T>> Partition<T>(IEnumerable<T> source, int size)
{
    if (source == null)
        throw new ArgumentNullException("list");

    if (size < 1)
        throw new ArgumentOutOfRangeException("size");

    int index = 1;
    IEnumerable<T> partition = source.Take(size).AsEnumerable();

    while (partition.Any())
    {
        yield return partition;
        partition = source.Skip(index++ * size).Take(size).AsEnumerable();
    }
}

I tried rewriting this in VB, but had to use a second list to collect results into, which ended up taking significantly more time than the implementation above.

I'm looking for another algorithm I could use in VB.NET, but most of the results run into the issue of having to basically load everything into the memory instead of the ability to dynamically produce the results a la generators in python. Not an huge issue, but not as ideal as it would be with yield return.

Is there a good, recommended algorithm for doing this in VB.NET? Would I have to create something implementing IEnumerator to generate the results on demand?

Jeff B
  • 8,572
  • 17
  • 61
  • 140
  • 1
    I believe yield was introduced in .NET 2.0 – Mitch Wheat Dec 12 '13 at 23:58
  • possible duplicate of [Yield in VB.NET](http://stackoverflow.com/questions/97381/yield-in-vb-net) – Jim Mischel Dec 13 '13 at 00:37
  • @MitchWheat I don't think so... It's not in this [list of VB keywords](http://msdn.microsoft.com/en-us/library/dd409611(v=vs.100).aspx) in VS2010 at least... – Jeff B Dec 13 '13 at 01:31
  • I was referring to c#. Sorry. – Mitch Wheat Dec 13 '13 at 01:35
  • In my defense against the [suggested duplicate](http://stackoverflow.com/questions/97381/yield-in-vb-net), my question is about what would be a good choice of algorithm given the limitation of not having `yield`... not how to create my own version of yield to translate the example algorithm in my question. – Jeff B Dec 13 '13 at 03:54
  • @MarcinJuraszek how would Skip/Take become n^2? Would it not be at most n*log(n)? – Wayne Werner Dec 13 '13 at 17:32
  • I don't think the algorithm you've copied is good. Because of Skip/Take combination there is a chance, that it's O(n*log(n)) solution (depends how IEnumerable is implemented on source parameter), which is quite bad. – MarcinJuraszek Dec 13 '13 at 20:11
  • 1
    @JeffBridgman I've just written a blogpost about partitioning method and performance. Check it out: [Partitioning the collection using LINQ: different approaches, different performance, the same result](http://mjuraszek.blogspot.com/2013/12/partitioning-collection-using-linq.html). The difference is huge! – MarcinJuraszek Dec 15 '13 at 23:36

3 Answers3

2

This might work as a work around. Make the sub routine a Sub and pass the target list in. now you can add the sub lists to it directly without creating a whole intermediary object first.

Dim testlist As New List(Of List(Of Integer))
Partition(Of Integer)({1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0}, 4, testlist)

Public Sub Partition(Of T)(source As IEnumerable(Of T), size As Integer, ByRef input As List(Of List(Of T)))
    If source Is Nothing Then
        Throw New ArgumentNullException("list")
    End If
    If size < 1 Then
        Throw New ArgumentOutOfRangeException("size")
    End If
    For i = 0 To source.Count - 1 Step size
        input.Add(source.Skip(i).Take(size).ToList())
    Next
End Sub
tinstaafl
  • 6,908
  • 2
  • 15
  • 22
2

Since I'm using the partitions locally, I ended up chosing to invert the situation: instead of passing the partitions to the action using it, I can pass the action to the function doing the partitioning.

Public Sub DoForPartition(Of T)(source As IEnumerable(Of T), 
                                size As Integer, 
                                doThis As Action(Of IEnumerable(Of T)))

    Dim partition(size - 1) As T
    Dim count = 0

    For Each t in source
        partition(count) = t
        count += 1

        If count = size Then
            doThis(partition)
            count = 0
        End If
    Next

    If count > 0 Then
        Array.Resize(partition, count)
        doThis(partition)
    End If
End Sub

This function avoids looping through the source multiple times and the only memory overhead is the size of the partition (instead of the entire source like some of the other options). I didn't write this function myself, but adapted a similarly looking C# function from this answer.

This looks like a much better algorithm than the one in my question.

Community
  • 1
  • 1
Jeff B
  • 8,572
  • 17
  • 61
  • 140
0

Lacking the Yield, you can do the following. I'm assuming you can convert to VB

public IEnumerable<IEnumerable<T>> Partition<T>(IEnumerable<T> source, int size)
{
    if (source == null)
        throw new ArgumentNullException("list");

    if (size < 1)
        throw new ArgumentOutOfRangeException("size");

    List<List<T>> partitions = new List<List<T>>();

    int index = 1;
    IEnumerable<T> partition = source.Take(size).AsEnumerable();

    while (partition.Any())
    {
        partitions.Add(partition);
        partition = source.Skip(index++ * size).Take(size).AsEnumerable();
    }

    return partitions;
}

Note that this isn't an exact translation. The original code returns one partition at a time. This code creates all of the partitions and then returns them all in a list. Not a problem if the source isn't huge.

That said, it will look like the same thing to your code. That is, if you have:

foreach (IEnumerable<Foo> part in Partition(FooList, 100))
{
    // whatever
}

both versions will do essentially the same thing. The real difference is that my translation above will work as though you had written:

foreach (IEnumerable<Foo> part in Partition(FooList, 100).ToList())

As somebody pointed out in comments, this isn't the most efficient way to do things because the Skip could end up having to iterate over items multiple times. Again, if the source list isn't huge, then that probably isn't a problem. And it's likely that Skip is optimized to do direct indexing for things that implement IList.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351