2

I am dealing with trying to chunk up items in a custom collection class that implements IEnumerable (and ICollection) in C# 2.0. Let's say, for example, that I only want 1000 items at a time and I have 3005 items in my collection. I've got a working solution that I demonstrate below, but it seems so primitive that I figure there has to be a better way to do this.

Here's what I have (for example's sake, I'm using C# 3.0's Enumerable and var, just replace those references with a custom class in your mind):

var items = Enumerable.Range(0, 3005).ToList();
int count = items.Count();
int currentCount = 0, limit = 0, iteration = 1;

List<int> temp = new List<int>();

while (currentCount < count)
{
    limit = count - currentCount;

    if (limit > 1000)
    {
        limit = 1000 * iteration;
    }
    else
    {
        limit += 1000 * (iteration - 1);
    }
    for (int i = currentCount; i < limit; i++)
    {
        temp.Add(items[i]);
    }

    //do something with temp

    currentCount += temp.Count;
    iteration++;
    temp.Clear();
}

Can anyone suggest a more elegant way of doing this in C# 2.0? I know if this project was from the past 5 years I could use Linq (as demonstrated here and here). I know my method will work, but I'd rather not have my name associated with such ugly (in my opinion) code.

Thanks.

Community
  • 1
  • 1
Sven Grosen
  • 5,616
  • 3
  • 30
  • 52
  • If your existing code works then this should be on a site such as programmers or code review, not here on stackoverflow. – Servy Aug 29 '12 at 21:00
  • Ah, I didn't know about the code review site until you mentioned it. I'll post it there too, thanks. – Sven Grosen Aug 29 '12 at 21:02
  • You shouldn't cross post questions. You should delete the question here (or wait for it to be closed) if you're going to re-post it. – Servy Aug 29 '12 at 21:03
  • Link to your other question though, as I was part way through writing an answer here. – Jon Hanna Aug 29 '12 at 21:05
  • Ah, I just added. We can move the whole thing. Edit: Or we could, but codereview isn't in the list of sites it suggests moving it to :( – Jon Hanna Aug 29 '12 at 21:12
  • I'll just leave it here since it got some attention right away. Next time I run into a "smelly" code situation, I'll post it to code review. – Sven Grosen Aug 29 '12 at 21:34
  • duplicate of [Split List into Sublists with LINQ](http://stackoverflow.com/q/419019/58678) which has non linq implementations too. – hIpPy Aug 30 '12 at 17:13

2 Answers2

8

Firstly . yield is your friend here, and it was introduced with 2.0. Consider:

public static IEnumerable<List<T>> Chunk<T>(IEnumerable<T> source, int chunkSize)
{
  List<T> list = new List<T>(chunkSize);
  foreach(T item in source)
  {
    list.Add(item);
    if(list.Count == chunkSize)
    {
      yield return list;
      list = new List<T>(chunkSize);
    }
  }
  //don't forget the last one!
  if(list.Count != 0)
    yield return list;
}

Then we're flexible in type and size, so it's nicely reusable. The only that being restricted to 2.0 means, is that we can't make it an extension method.

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
  • Awesome, this is just what I needed. I feel ashamed for not remembering that I still had access to generics and yield in C# 2.0, especially since I just finished Jon Skeet's C# in Depth 2nd Edition last week. I still had to massage the data a little bit to get it to work with the API I have to use, but this got me most of the way there. Thanks. – Sven Grosen Aug 29 '12 at 21:33
  • I was such a field for writing `IEnumerator` and `IEnumerable` implementations in 1.1, that it's a change I'll never forget. – Jon Hanna Aug 29 '12 at 21:37
  • 1
    I need to use `yield` more. :) Like this implementation just because this does not hold up the processing of list when `list.Count` hits `chunkSize`. It's also loosely coupled since the `ProcessChunk(items)` call is not in it. Very nice. – hIpPy Aug 29 '12 at 21:53
  • @hIpPy What was going to be the "secondly" in this answer, but I decided to leave it, was to `yield` a call to a method that `yield`s individual items, so that `Chunk` becomes `IEnumerable>`. If you want to stretch your `yield` skillz, then implementing that is a worthwhile exercise - just complicated enough to need a bit of stretching, without being crazy. – Jon Hanna Aug 29 '12 at 22:13
2

There are several ways you could approach this.

If you just want to associate each item with the index of the chunk it belongs to:

int processed = 0;
foreach (int item in items)
{
    int chunkIndex = processed++ / CHUNK_SIZE;
    ProcessItem(item, chunkIndex);
}

If you want to process items in batches, but don't need the whole chunk collection at once:

int processed = 0, count = items.Count;
List<int> chunk = new List<int>(CHUNK_SIZE);
foreach (int item in items)
{
    chunk.Add(item);
    if (++processed % CHUNK_SIZE == 0 || processed == count) {
        ProcessChunk(chunk);
        chunk.Clear();
    }
}

If you want to have all chunks as a list of lists:

int processed = 0, count = items.Count;
List<List<int>> chunks = new List<List<int>>();
foreach (int item in items)
{
    int chunkIndex = processed++ / CHUNK_SIZE;
    if (chunks.Count == chunkIndex) {
        chunks.Add(new List<int>(CHUNK_SIZE));
    }

    chunks[chunkIndex].Add(item);
}
Jon
  • 428,835
  • 81
  • 738
  • 806
  • Your answer was good, but the answer provided by @Jon Hanna was a little more elegant (and potentially more reusable). – Sven Grosen Aug 30 '12 at 18:30