325

I am attempting to split a list into a series of smaller lists.

My Problem: My function to split lists doesn't split them into lists of the correct size. It should split them into lists of size 30 but instead it splits them into lists of size 114?

How can I make my function split a list into X number of Lists of size 30 or less?

public static List<List<float[]>> splitList(List <float[]> locations, int nSize=30) 
{       
    List<List<float[]>> list = new List<List<float[]>>();

    for (int i=(int)(Math.Ceiling((decimal)(locations.Count/nSize))); i>=0; i--) {
        List <float[]> subLocat = new List <float[]>(locations); 

        if (subLocat.Count >= ((i*nSize)+nSize))
            subLocat.RemoveRange(i*nSize, nSize);
        else subLocat.RemoveRange(i*nSize, subLocat.Count-(i*nSize));

        Debug.Log ("Index: "+i.ToString()+", Size: "+subLocat.Count.ToString());
        list.Add (subLocat);
    }

    return list;
}

If I use the function on a list of size 144 then the output is:

Index: 4, Size: 120
Index: 3, Size: 114
Index: 2, Size: 114
Index: 1, Size: 114
Index: 0, Size: 114

fubo
  • 44,811
  • 17
  • 103
  • 137
sazr
  • 24,984
  • 66
  • 194
  • 362
  • 1
    If a LINQ solution is acceptable, [this question may be of some help](http://stackoverflow.com/questions/419019/split-list-into-sublists-with-linq). –  Jul 13 '12 at 03:28
  • Specifically Sam Saffron's answer on that previous question. And unless this is for a school assignment, I would just use his code and stop. – jcolebrand Jul 13 '12 at 03:35

21 Answers21

551

I would suggest to use this extension method to chunk the source list to the sub-lists by specified chunk size:

/// <summary>
/// Helper methods for the lists.
/// </summary>
public static class ListExtensions
{
    public static List<List<T>> ChunkBy<T>(this List<T> source, int chunkSize) 
    {
        return source
            .Select((x, i) => new { Index = i, Value = x })
            .GroupBy(x => x.Index / chunkSize)
            .Select(x => x.Select(v => v.Value).ToList())
            .ToList();
    }
}

For example, if you chunk the list of 18 items by 5 items per chunk, it gives you the list of 4 sub-lists with the following items inside: 5-5-5-3.

NOTE: at the upcoming improvements to LINQ in .NET 6 chunking will come out of the box like this:

const int PAGE_SIZE = 5;

IEnumerable<Movie[]> chunks = movies.Chunk(PAGE_SIZE);
Dmitry Pavlov
  • 30,789
  • 8
  • 97
  • 121
  • 38
    Before you use this in production, make sure you understand what the run-time implications for memory and performance are. Just because LINQ can be succinct, doesn't mean it's a good idea. – Nick Jun 19 '17 at 21:11
  • 6
    Definitely, @Nick I would suggest in general to think before doing anything. Chunking with LINQ should not be an often operation repeated thousand of times. Usually you need to chunk lists for processing items batch by batch and/or in parallel. – Dmitry Pavlov Jun 23 '17 at 12:13
  • I don't think this will keep the order though – Marc Jul 12 '17 at 13:08
  • 1
    @Marc TDD ;) test first! I mean - if you are not sure, you always can prove it doesn't with the test. – Dmitry Pavlov Jul 12 '17 at 19:21
  • 14
    I don't think memory and performance should be a big issue here. I happened to have a requirement of splitting a list with over 200,000 records into smaller lists with about 3000 each, which brought me to this thread, and I tested both methods and found the running time is almost the same. After that I tested splitting that list into lists with 3 records each and still the performance is OK. I do think Serj-Tm's solution is more straightforward and has better maintainability though. – Silent Sojourner Oct 11 '17 at 21:46
  • 2
    Note that it might be best to leave off the `ToList()`s, and let lazy evaluation do it's magic. – Yair Halberstadt Mar 28 '18 at 07:45
  • This code ran out of memory and threw and exception on a list of 100M messages split into chunks of 10 for me – Iarek Jun 20 '18 at 11:26
  • 3
    @IarekKovtunenko well, with zillions of records you definitely should tune the algorithm for your specific needs. I would implement something like streams processing logic with buffer, which chunks records in 2 steps: 1) gets the first portion - any reasonable amount of records (e.g. 10K) and 2) chunks each within each portion. Do not hammer nails with a microscope - use the right tool for this task ;) – Dmitry Pavlov Jun 20 '18 at 17:54
  • 7
    @DmitryPavlov During *all* this time, I never knew about being able to project the index like that in a select statement! I thought it was a new feature till I noticed you posted this in 2014, that really surprised me! Thanks for sharing this. Also, wouldn't be better off to have this extension method available to an IEnumerable and also return an IEnumerable? – Aydin Jun 02 '19 at 03:11
  • You are welcome @Aydin - yeah, indexing in select allows to keep using `LINQ` instead of switching to `for` loop. Very helpful in some cases. – Dmitry Pavlov Jun 09 '19 at 22:12
  • Best to use MoreLinq Batch : https://stackoverflow.com/a/46405991/913845 – Zar Shardan Jun 21 '20 at 09:34
384
public static List<List<float[]>> SplitList(List<float[]> locations, int nSize=30)  
{        
    var list = new List<List<float[]>>(); 

    for (int i = 0; i < locations.Count; i += nSize) 
    { 
        list.Add(locations.GetRange(i, Math.Min(nSize, locations.Count - i))); 
    } 

    return list; 
} 

Generic version:

public static IEnumerable<List<T>> SplitList<T>(List<T> locations, int nSize=30)  
{        
    for (int i = 0; i < locations.Count; i += nSize) 
    { 
        yield return locations.GetRange(i, Math.Min(nSize, locations.Count - i)); 
    }  
} 
keuleJ
  • 3,418
  • 4
  • 30
  • 51
Serj-Tm
  • 16,581
  • 4
  • 54
  • 61
  • So if I have a List length zillion, and I want to split into smaller lists Length 30, and from every smaller list I only want to Take(1), then I still create lists of 30 items of which I throw away 29 items. This can be done smarter! – Harald Coppoolse Mar 05 '18 at 15:54
  • Does this actually work? Wouldn't it fail on the first split because you are getting the range nSize to nSize? For example if nSize is 3 and my array is size 5 then the first index range returned is `GetRange(3, 3)` – Matthew Pigram Mar 22 '18 at 01:11
  • 2
    @MatthewPigram tested and it's working. Math.Min takes the min value so if last chunk is less than nSize (2 < 3), it creates a list with remaining items. – Phate01 Mar 22 '18 at 16:11
  • 1
    @HaraldCoppoolse the OP didn't ask for selecting, only to split lists – Phate01 Mar 22 '18 at 16:11
  • @MatthewPigram First iteration - GetRange(0,3), second iteration - GetRange(3,2) – Serj-Tm Mar 28 '18 at 01:22
  • yes it does work, I think I accidentely messed up the logic in my test – Matthew Pigram Mar 28 '18 at 05:48
  • In .NET naming convention is to name functions, variables with Capital letter first :) Just to avoid being ugly crap like c++ :) – Tommix Jan 04 '20 at 17:29
  • Just wondering, wouldn't it be better to assign `locations.count` to a variable so it does not need to be recalculated again and again. Or is this optimized for you? – Jorn.Beyers Mar 25 '20 at 14:41
  • 2
    @Jorn.Beyers that might fall into the category of micro-optimizations. It's only a problem if it's a problem. Microsoft says that .Count is an O(1) operation, so I doubt you'd see any improvement by storing it in a variable: https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.count?view=netcore-3.1 – user1666620 Sep 15 '20 at 13:54
  • I think it could be " i <= locations.Count", other it will skip last number/object. – user2404597 Jun 25 '21 at 01:03
58

Update for .NET 6

var originalList = new List<int>{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}

// split into arrays of no more than three
IEnumerable<int[]> chunks = originalList.Chunk(3);

Prior to .NET 6

public static IEnumerable<IEnumerable<T>> SplitIntoSets<T>
    (this IEnumerable<T> source, int itemsPerSet) 
{
    var sourceList = source as List<T> ?? source.ToList();
    for (var index = 0; index < sourceList.Count; index += itemsPerSet)
    {
        yield return sourceList.Skip(index).Take(itemsPerSet);
    }
}
Jaacko Torus
  • 796
  • 7
  • 24
Scott Hannen
  • 27,588
  • 3
  • 45
  • 62
55

how about:

while(locations.Any())
{    
    list.Add(locations.Take(nSize).ToList());
    locations= locations.Skip(nSize).ToList();
}
Rafal
  • 12,391
  • 32
  • 54
  • Is this going to consume lots of memory? Each time locations.Skip.ToList happens I wonder if more memory is allocated and unskipped items are referenced by a new list. – Zasz Feb 12 '14 at 07:40
  • 3
    yes new list is created on every loop. Yes it consumes memory. But if you are having memory issues this is not the place to optimize as instances of that lists are ready to be collected on next loop. You can trade performance for memory by skipping the `ToList` but I wouldn't bother trying to optimize it - it is so trivial and unlikely is a bottleneck. The main gain from this implementation is its triviality it is easy to understand. If you want you can use the accepted answer it does not create those lists but is a bit more complex. – Rafal Feb 12 '14 at 10:59
  • 2
    `.Skip(n)` iterates over `n` elements each time it's called, while this may be ok, it's important to consider for performance-critical code. http://stackoverflow.com/questions/20002975/performance-of-skip-and-similar-functions-like-take – Chakrava Aug 23 '16 at 16:44
  • @Chakrava sure, my solution is not to be used in performance critical code, yet in my experience you first write working code and then determine what is performance critical and it seldom where my linq to objects operations performed on say 50 objects. This should be evaluated case by case. – Rafal Aug 25 '16 at 09:32
  • @Rafal I agree, I've found numerous `.Skip()`s in my company's code base, and while they may not be "optimal" they work just fine. Things like DB operations take much longer anyways. But I think it's an important thing to note that `.Skip()` "touches" each element < n on its way instead of jumping to the nth-element directly (like you might expect). If your iterator has side-effects from touching an element `.Skip()` can be the cause of hard-to-find bugs. – Chakrava Aug 27 '16 at 16:10
51

Library MoreLinq have method called Batch

List<int> ids = new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 }; // 10 elements
int counter = 1;
foreach(var batch in ids.Batch(2))
{
    foreach(var eachId in batch)
    {
        Console.WriteLine("Batch: {0}, Id: {1}", counter, eachId);
    }
    counter++;
}

Result is

Batch: 1, Id: 1
Batch: 1, Id: 2
Batch: 2, Id: 3
Batch: 2, Id: 4
Batch: 3, Id: 5
Batch: 3, Id: 6
Batch: 4, Id: 7
Batch: 4, Id: 8
Batch: 5, Id: 9
Batch: 5, Id: 0

ids are splitted into 5 chunks with 2 elements.

Jon Schneider
  • 25,758
  • 23
  • 142
  • 170
devowiec
  • 708
  • 6
  • 16
15

Serj-Tm solution is fine, also this is the generic version as extension method for lists (put it into a static class):

public static List<List<T>> Split<T>(this List<T> items, int sliceSize = 30)
{
    List<List<T>> list = new List<List<T>>();
    for (int i = 0; i < items.Count; i += sliceSize)
        list.Add(items.GetRange(i, Math.Min(sliceSize, items.Count - i)));
    return list;
} 
Serj-Tm
  • 16,581
  • 4
  • 54
  • 61
equintas
  • 432
  • 4
  • 7
13

I find accepted answer (Serj-Tm) most robust, but I'd like to suggest a generic version.

public static List<List<T>> splitList<T>(List<T> locations, int nSize = 30)
{
    var list = new List<List<T>>();

    for (int i = 0; i < locations.Count; i += nSize)
    {
        list.Add(locations.GetRange(i, Math.Min(nSize, locations.Count - i)));
    }

    return list;
}
Pavel Anikhouski
  • 21,776
  • 12
  • 51
  • 66
Linas
  • 169
  • 1
  • 6
10

Addition after very useful comment of mhand at the end

Original answer

Although most solutions might work, I think they are not very efficiently. Suppose if you only want the first few items of the first few chunks. Then you wouldn't want to iterate over all (zillion) items in your sequence.

The following will at utmost enumerate twice: once for the Take and once for the Skip. It won't enumerate over any more elements than you will use:

public static IEnumerable<IEnumerable<TSource>> ChunkBy<TSource>
    (this IEnumerable<TSource> source, int chunkSize)
{
    while (source.Any())                     // while there are elements left
    {   // still something to chunk:
        yield return source.Take(chunkSize); // return a chunk of chunkSize
        source = source.Skip(chunkSize);     // skip the returned chunk
    }
}

How many times will this Enumerate the sequence?

Suppose you divide your source into chunks of chunkSize. You enumerate only the first N chunks. From every enumerated chunk you'll only enumerate the first M elements.

While(source.Any())
{
     ...
}

the Any will get the Enumerator, do 1 MoveNext() and returns the returned value after Disposing the Enumerator. This will be done N times

yield return source.Take(chunkSize);

According to the reference source this will do something like:

public static IEnumerable<TSource> Take<TSource>(this IEnumerable<TSource> source, int count)
{
    return TakeIterator<TSource>(source, count);
}

static IEnumerable<TSource> TakeIterator<TSource>(IEnumerable<TSource> source, int count)
{
    foreach (TSource element in source)
    {
        yield return element;
        if (--count == 0) break;
    }
}

This doesn't do a lot until you start enumerating over the fetched Chunk. If you fetch several Chunks, but decide not to enumerate over the first Chunk, the foreach is not executed, as your debugger will show you.

If you decide to take the first M elements of the first chunk then the yield return is executed exactly M times. This means:

  • get the enumerator
  • call MoveNext() and Current M times.
  • Dispose the enumerator

After the first chunk has been yield returned, we skip this first Chunk:

source = source.Skip(chunkSize);

Once again: we'll take a look at reference source to find the skipiterator

static IEnumerable<TSource> SkipIterator<TSource>(IEnumerable<TSource> source, int count)
{
    using (IEnumerator<TSource> e = source.GetEnumerator()) 
    {
        while (count > 0 && e.MoveNext()) count--;
        if (count <= 0) 
        {
            while (e.MoveNext()) yield return e.Current;
        }
    }
}

As you see, the SkipIterator calls MoveNext() once for every element in the Chunk. It doesn't call Current.

So per Chunk we see that the following is done:

  • Any(): GetEnumerator; 1 MoveNext(); Dispose Enumerator;
  • Take():

    • nothing if the content of the chunk is not enumerated.
    • If the content is enumerated: GetEnumerator(), one MoveNext and one Current per enumerated item, Dispose enumerator;

    • Skip(): for every chunk that is enumerated (NOT the contents of the chunk): GetEnumerator(), MoveNext() chunkSize times, no Current! Dispose enumerator

If you look at what happens with the enumerator, you'll see that there are a lot of calls to MoveNext(), and only calls to Current for the TSource items you actually decide to access.

If you take N Chunks of size chunkSize, then calls to MoveNext()

  • N times for Any()
  • not yet any time for Take, as long as you don't enumerate the Chunks
  • N times chunkSize for Skip()

If you decide to enumerate only the first M elements of every fetched chunk, then you need to call MoveNext M times per enumerated Chunk.

The total

MoveNext calls: N + N*M + N*chunkSize
Current calls: N*M; (only the items you really access)

So if you decide to enumerate all elements of all chunks:

MoveNext: numberOfChunks + all elements + all elements = about twice the sequence
Current: every item is accessed exactly once

Whether MoveNext is a lot of work or not, depends on the type of source sequence. For lists and arrays it is a simple index increment, with maybe an out of range check.

But if your IEnumerable is the result of a database query, make sure that the data is really materialized on your computer, otherwise the data will be fetched several times. DbContext and Dapper will properly transfer the data to local process before it can be accessed. If you enumerate the same sequence several times it is not fetched several times. Dapper returns an object that is a List, DbContext remembers that the data is already fetched.

It depends on your Repository whether it is wise to call AsEnumerable() or ToLists() before you start to divide the items in Chunks

Harald Coppoolse
  • 28,834
  • 7
  • 67
  • 116
  • won't this enumerate twice *per* batch? so we're really enumerating the source `2*chunkSize` times? This is deadly depending on the source of the enumerable (perhaps DB backed, or other non-memoized source). Imagine this enumerable as input `Enumerable.Range(0, 10000).Select(i => DateTime.UtcNow)` -- you'll get different times every time you enumerate the enumerable since it's not memoized – mhand Mar 20 '18 at 22:11
  • Consider: `Enumerable.Range(0, 10).Select(i => DateTime.UtcNow)`. By invoking `Any` you'll be recomputing the current time each time. Not so bad for `DateTime.UtcNow`, but consider an enumerable backed by a database connection/sql cursor or similar. I've seen cases where thousands of DB calls were issued because the developer didn't understand the potential repercussions of 'multiple enumerations of an enumerable' -- [ReSharper](https://www.jetbrains.com/help/resharper/PossibleMultipleEnumeration.html) provides a hint for this as well – mhand Nov 10 '18 at 20:18
9

While plenty of the answers above do the job, they all fail horribly on a never ending sequence (or a really long sequence). The following is a completely on-line implementation which guarantees best time and memory complexity possible. We only iterate the source enumerable exactly once and use yield return for lazy evaluation. The consumer could throw away the list on each iteration making the memory footprint equal to that of the list w/ batchSize number of elements.

public static IEnumerable<List<T>> BatchBy<T>(this IEnumerable<T> enumerable, int batchSize)
{
    using (var enumerator = enumerable.GetEnumerator())
    {
        List<T> list = null;
        while (enumerator.MoveNext())
        {
            if (list == null)
            {
                list = new List<T> {enumerator.Current};
            }
            else if (list.Count < batchSize)
            {
                list.Add(enumerator.Current);
            }
            else
            {
                yield return list;
                list = new List<T> {enumerator.Current};
            }
        }

        if (list?.Count > 0)
        {
            yield return list;
        }
    }
}

EDIT: Just now realizing the OP asks about breaking a List<T> into smaller List<T>, so my comments regarding infinite enumerables aren't applicable to the OP, but may help others who end up here. These comments were in response to other posted solutions that do use IEnumerable<T> as an input to their function, yet enumerate the source enumerable multiple times.

mhand
  • 1,231
  • 1
  • 11
  • 21
  • I think the `IEnumerable>` version is better as it doesn't involve so much `List` construction. – NetMage Apr 10 '18 at 19:01
  • @NetMage - one issue with `IEnumerable>` is that the implementation is likely to rely on the consumer fully enumerating each inner enumerable yielded. I'm sure a solution could be phrased in a way to avoid that issue, but I think the resulting code could get complex pretty quickly. Also, since it's lazy, we're only generating a single list at a time and memory allocation happens exactly once per list since we know the size up front. – mhand Nov 10 '18 at 20:14
  • You are right - my implementation uses a new type of enumerator (a Position Enumerator) that tracks your current position wrapping a standard enumerator and let's you move to a new position. – NetMage Nov 12 '18 at 18:52
8

I have a generic method that would take any types include float, and it's been unit-tested, hope it helps:

    /// <summary>
    /// Breaks the list into groups with each group containing no more than the specified group size
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="values">The values.</param>
    /// <param name="groupSize">Size of the group.</param>
    /// <returns></returns>
    public static List<List<T>> SplitList<T>(IEnumerable<T> values, int groupSize, int? maxCount = null)
    {
        List<List<T>> result = new List<List<T>>();
        // Quick and special scenario
        if (values.Count() <= groupSize)
        {
            result.Add(values.ToList());
        }
        else
        {
            List<T> valueList = values.ToList();
            int startIndex = 0;
            int count = valueList.Count;
            int elementCount = 0;

            while (startIndex < count && (!maxCount.HasValue || (maxCount.HasValue && startIndex < maxCount)))
            {
                elementCount = (startIndex + groupSize > count) ? count - startIndex : groupSize;
                result.Add(valueList.GetRange(startIndex, elementCount));
                startIndex += elementCount;
            }
        }


        return result;
    }
Tianzhen Lin
  • 2,404
  • 1
  • 19
  • 19
  • Thanks. Wonder if you could update the comments with the maxCount parameter definition? A safety net? – Andrew Jens Mar 21 '16 at 22:54
  • 2
    be careful with multiple enumerations of the enumerable. `values.Count()` will cause a full enumeration and then `values.ToList()` another. Safer to do `values = values.ToList()` so it's already materialized. – mhand Mar 20 '18 at 19:29
4

As of .NET 6.0, you can use the LINQ extension Chunk<T>() to split enumerations into chunks. Docs

var chars = new List<char>() { 'h', 'e', 'l', 'l', 'o', 'w','o','r' ,'l','d' };
foreach (var batch in chars.Chunk(2))
{
    foreach (var ch in batch)
    {
        // iterates 2 letters at a time
    }
}
olabacker
  • 1,232
  • 1
  • 16
  • 27
3
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items, int maxItems)
{
    return items.Select((item, index) => new { item, index })
                .GroupBy(x => x.index / maxItems)
                .Select(g => g.Select(x => x.item));
}
pushkin
  • 9,575
  • 15
  • 51
  • 95
Codester
  • 57
  • 3
  • instead of `.Select(g => g.Select(x => x.item));` can we send it to a `class` like `.Select(g => g.Select(x => new { v = x.item}));` ? – hiFI Oct 05 '21 at 07:05
2

How about this one? The idea was to use only one loop. And, who knows, maybe you're using only IList implementations thorough your code and you don't want to cast to List.

private IEnumerable<IList<T>> SplitList<T>(IList<T> list, int totalChunks)
{
    IList<T> auxList = new List<T>();
    int totalItems = list.Count();

    if (totalChunks <= 0)
    {
        yield return auxList;
    }
    else 
    {
        for (int i = 0; i < totalItems; i++)
        {               
            auxList.Add(list[i]);           

            if ((i + 1) % totalChunks == 0)
            {
                yield return auxList;
                auxList = new List<T>();                
            }

            else if (i == totalItems - 1)
            {
                yield return auxList;
            }
        }
    }   
}
2

In .NET 6 you can just use source.Chunk(chunkSize)

A more generic version based on the accepted answer by Serj-Tm.

    public static IEnumerable<IEnumerable<T>> Split<T>(IEnumerable<T> source, int size = 30)
    {
        var count = source.Count();
        for (int i = 0; i < count; i += size)
        {
            yield return source
                .Skip(Math.Min(size, count - i))
                .Take(size);
        }
    }
XzaR
  • 610
  • 1
  • 7
  • 17
  • `IEnumerable` sources should not be enumerated more than once. It's not guaranteed that each enumeration is cheap, or that a subsequent enumeration will yield the same items as the previous enumeration. – Theodor Zoulias Nov 30 '21 at 09:55
1

One more

public static IList<IList<T>> SplitList<T>(this IList<T> list, int chunkSize)
{
    var chunks = new List<IList<T>>();
    List<T> chunk = null;
    for (var i = 0; i < list.Count; i++)
    {
        if (i % chunkSize == 0)
        {
            chunk = new List<T>(chunkSize);
            chunks.Add(chunk);
        }
        chunk.Add(list[i]);
    }
    return chunks;
}
1
public static List<List<T>> ChunkBy<T>(this List<T> source, int chunkSize)
    {           
        var result = new List<List<T>>();
        for (int i = 0; i < source.Count; i += chunkSize)
        {
            var rows = new List<T>();
            for (int j = i; j < i + chunkSize; j++)
            {
                if (j >= source.Count) break;
                rows.Add(source[j]);
            }
            result.Add(rows);
        }
        return result;
    }
Baskovli
  • 530
  • 5
  • 13
0
List<int> orginalList =new List<int>(){1,2,3,4,5,6,7,8,9,10,12};
Dictionary<int,List<int>> dic = new Dictionary <int,List<int>> ();
int batchcount = orginalList.Count/2; //To List into two 2 parts if you 
 want three give three
List<int> lst = new List<int>();
for (int i=0;i<orginalList.Count; i++)
{
lst.Add(orginalList[i]);
if (i % batchCount == 0 && i!=0)
{
Dic.Add(threadId, lst);
lst = new List<int>();**strong text**
threadId++;
}
}
if(lst.Count>0)
Dic.Add(threadId, lst); //in case if any dayleft 
foreach(int BatchId in Dic.Keys)
{
  Console.Writeline("BatchId:"+BatchId);
  Console.Writeline('Batch Count:"+Dic[BatchId].Count);
}
0

I had encountered this same need, and I used a combination of Linq's Skip() and Take() methods. I multiply the number I take by the number of iterations this far, and that gives me the number of items to skip, then I take the next group.

        var categories = Properties.Settings.Default.MovementStatsCategories;
        var items = summariesWithinYear
            .Select(s =>  s.sku).Distinct().ToList();

        //need to run by chunks of 10,000
        var count = items.Count;
        var counter = 0;
        var numToTake = 10000;

        while (count > 0)
        {
            var itemsChunk = items.Skip(numToTake * counter).Take(numToTake).ToList();
            counter += 1;

            MovementHistoryUtilities.RecordMovementHistoryStatsBulk(itemsChunk, categories, nLogger);

            count -= numToTake;
        }
Becca
  • 106
  • 1
  • 8
0

Based on Dimitry Pavlov answere I would remove .ToList(). And also avoid the anonymous class. Instead I like to use a struct which does not require a heap memory allocation. (A ValueTuple would also do job.)

public static IEnumerable<IEnumerable<TSource>> ChunkBy<TSource>(this IEnumerable<TSource> source, int chunkSize)
{
    if (source is null)
    {
        throw new ArgumentNullException(nameof(source));
    }
    if (chunkSize <= 0)
    {
        throw new ArgumentOutOfRangeException(nameof(chunkSize), chunkSize, "The argument must be greater than zero.");
    }

    return source
        .Select((x, i) => new ChunkedValue<TSource>(x, i / chunkSize))
        .GroupBy(cv => cv.ChunkIndex)
        .Select(g => g.Select(cv => cv.Value));
} 

[StructLayout(LayoutKind.Auto)]
[DebuggerDisplay("{" + nameof(ChunkedValue<T>.ChunkIndex) + "}: {" + nameof(ChunkedValue<T>.Value) + "}")]
private struct ChunkedValue<T>
{
    public ChunkedValue(T value, int chunkIndex)
    {
        this.ChunkIndex = chunkIndex;
        this.Value = value;
    }

    public int ChunkIndex { get; }

    public T Value { get; }
}

This can be used like the following which only iterates over the collection once and also does not allocate any significant memory.

int chunkSize = 30;
foreach (var chunk in collection.ChunkBy(chunkSize))
{
    foreach (var item in chunk)
    {
        // your code for item here.
    }
}

If a concrete list is actually needed then I would do it like this:

int chunkSize = 30;
var chunkList = new List<List<T>>();
foreach (var chunk in collection.ChunkBy(chunkSize))
{
    // create a list with the correct capacity to be able to contain one chunk
    // to avoid the resizing (additional memory allocation and memory copy) within the List<T>.
    var list = new List<T>(chunkSize);
    list.AddRange(chunk);
    chunkList.Add(list);
}
TiltonJH
  • 441
  • 4
  • 6
0

in case you wanna split it with condition instead of fixed number :

///<summary>
/// splits a list based on a condition (similar to the split function for strings)
///</summary>
public static IEnumerable<List<T>> Split<T>(this IEnumerable<T> src, Func<T, bool> pred)
{
    var list = new List<T>();
    foreach(T item in src)
    {   
        if(pred(item))
        {
            if(list != null && list.Count > 0)
                yield return list;
                
            list = new List<T>();
        }
        else
        {
            list.Add(item);
        }
    }
}
Josef
  • 2,869
  • 2
  • 22
  • 23
0

You can simply try the following code with only using LINQ :

public static IList<IList<T>> Split<T>(IList<T> source)
{
    return  source
        .Select((x, i) => new { Index = i, Value = x })
        .GroupBy(x => x.Index / 3)
        .Select(x => x.Select(v => v.Value).ToList())
        .ToList();
}
cdev
  • 5,043
  • 2
  • 33
  • 32
sajadre
  • 1,141
  • 2
  • 15
  • 30