1

I'm asking this question again, since the last time I asked it, it was falsely marked as duplicated. I am going to include some more information this time, that might make it easier to understand my need (it may very well have been my own fault for not defining the question properly).

I'm trying to split a list of a generic type into 4 lists. For simplicity and understanding, I will use a list of integers in this example, but that shouldn't make a difference.

I have done a lot of searching, found multiple answers like "Split List into Sublists with LINQ", using batch methods to split, I have tried MoreLinq's Batch methods and so on. Those suggestions work fine for what they should, but they do not work the way I need them to.

If I have a list containing the following elements (integers ranging 1-25):

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25] 

Then what I need to do is make 4 lists with a variable number of elements in them, where the elements increment in the same list, instead of jumping to the next list with the next element.

[ 1,  2,  3,  4,  5,  6,  7]
[ 8,  9, 10, 11, 12, 13, 14]
[15, 16, 17, 18, 19, 20, 21]
[20, 21, 22, 23, 24, 25]

When using the solutions in either of the questions linked, with 4 "parts" as the parameter, I get lists like this (this is the example where the element jumps to the next list instead of just the next element of the list):

[1, 5,  9, 13, 17, 21, 25],
[2, 6, 10, 14, 18, 22, 26],
[3, 7, 11, 15, 19, 23, 27],
[4, 8, 12, 16, 20, 24]

or this (does the same as MoreLinq's Batch method)

[ 1,  2,  3,  4],
[ 5,  6,  7,  8],
[ 9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20],
[21, 22, 23, 24],
[25, 26, 27],

So the first solution splits the list into 4 lists, but puts the elements in the wrong order. The second solution splits the lists in the right order, but not in the right length. In the last solution, he gets X amount of lists with 4 elements in each, where I need to have 4 lists with X elements in each.

Community
  • 1
  • 1
Loyalar
  • 2,131
  • 4
  • 24
  • 44

7 Answers7

10

You can use following extension method to split list on required number of sub-lists, and include additional items in first sub-list:

public static IEnumerable<List<T>> Split<T>(this List<T> source, int count)
{
    int rangeSize = source.Count / count;
    int firstRangeSize = rangeSize + source.Count % count;
    int index = 0;

    yield return source.GetRange(index, firstRangeSize);
    index += firstRangeSize;

    while (index < source.Count)
    {         
        yield return source.GetRange(index, rangeSize);
        index += rangeSize;
    }
}

With given input

var list = Enumerable.Range(1, 25).ToList();
var result = list.Split(4);

Result is

[
  [ 1, 2, 3, 4, 5, 6, 7 ],
  [ 8, 9, 10, 11, 12, 13 ],
  [ 14, 15, 16, 17, 18, 19 ],
  [ 20, 21, 22, 23, 24, 25 ]
]

UPDATE: This solution adds additional item to each range

public static IEnumerable<List<T>> Split<T>(this List<T> source, int count)
{
    int rangeSize = source.Count / count;
    int additionalItems = source.Count % count;
    int index = 0;

    while (index < source.Count)
    {   
        int currentRangeSize = rangeSize + ((additionalItems > 0) ? 1 : 0);
        yield return source.GetRange(index, currentRangeSize);
        index += currentRangeSize;
        additionalItems--;
    }
}
Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
  • This is very close to what I need, but for some reason the remaining items above every 4 items are put into the first list, and not split into the next lists. For example, if there are 27 elements, I have lists of 9, 6, 6 and 6 in length, where they should probably be 7, 7, 7 and 6. – Loyalar Jun 17 '14 at 08:04
  • @Loyalar OK, it was not clear from sample output. I'll update answer – Sergey Berezovskiy Jun 17 '14 at 08:05
  • @Loyalar I added solution which will split source by 7,7,7,6 as you want – Sergey Berezovskiy Jun 17 '14 at 08:10
  • 1
    Does exactly what I need with the update. Thank you very much. – Loyalar Jun 17 '14 at 08:10
  • I wonder if there's a way to acheive this without enumerating the collection...? Probably not because we need to know the number of elements. – Nick Jun 17 '14 at 08:21
  • @Nick exactly, you should know total number of items before splitting them in equal batches. BTW IEnumerable can even be infinite.. in that case you will never get even first batch :) – Sergey Berezovskiy Jun 17 '14 at 08:24
  • What if `count < 1`?. If `source` is empty, shouldn't `count` empty lists be returned? You can use `Math.DivRem` to perform the calculations simultaneously. See my answer. @Loyalar – Jodrell Jun 17 '14 at 09:17
  • @Jodrell I didn't add parameters verification for simplicity. Source can be not only empty, it can be null. And your solution will throw `NullReferenceException`. Thanks for showing `Math.DivRem` it might be useful if performance really matters – Sergey Berezovskiy Jun 17 '14 at 09:21
  • @SergeyBerezovskiy, can it be `null` if this is an extension method? – Jodrell Jun 17 '14 at 09:34
  • 1
    @Jodrell of course it can. Extension method is a simple static method. I.e. your extension method call will be compiled to `Extensions.Segment(someCollection, 4)`. It's just syntax sugar. – Sergey Berezovskiy Jun 17 '14 at 09:40
2

Here is another solution based on IEnumerable<T>. It has the following characteristics:

  • Always yields batchCount items, if the source enumerable is smaller than the batch size, it will yield empty lists.
  • Favors larger lists at the front (e.g. when the batchCount is 2 and the size is 3, the length of the results will be [2,1].
  • Iterates multiple times of the IEnumerable. This means AsEnumerable should be called somewhere if something like an Entity Framework query is executed here.

The first example is optimized for List<T>

public static class BatchOperations
{
    public static IEnumerable<List<T>> Batch<T>(this List<T> items, int batchCount)
    {
        int totalSize = items.Count;
        int remain = totalSize % batchCount;
        int skip = 0;
        for (int i = 0; i < batchCount; i++)
        {
            int size = totalSize / batchCount + (i <= remain ? 1 : 0);
            if (skip + size > items.Count) yield return new List<T>(0);
            else yield return items.GetRange(skip, size);
            skip += size;
        }
    }

    public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items, int batchCount)
    {
        int totalSize = items.Count();
        int remain = totalSize%batchCount;
        int skip = 0;
        for (int i = 0; i < batchCount; i++)
        {
            int size = totalSize/batchCount + (i <= remain ? 1 : 0);
            yield return items.Skip(skip).Take(size);
            skip += size;
        }
    }
}
Bas
  • 26,772
  • 8
  • 53
  • 86
2

Sergey's answer is clearly the best for this, but for completeness here's a solution you could use if you did not want to make copies of the sublists for some reason (perhaps because you just had IEnumerable<T> as input):

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApp1
{
    public static class EnumerableExt
    {
        public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> input, int blockCount, int count)
        {
            int blockSize = count/blockCount;
            int currentBlockSize = blockSize + count%blockSize;

            var enumerator = input.GetEnumerator();

            while (enumerator.MoveNext())
            {
                yield return nextPartition(enumerator, currentBlockSize);
                currentBlockSize = blockSize;
            }
        }

        private static IEnumerable<T> nextPartition<T>(IEnumerator<T> enumerator, int blockSize)
        {
            do
            {
                yield return enumerator.Current;
            }
            while (--blockSize > 0 && enumerator.MoveNext());
        }
    }

    class Program
    {
        private void run()
        {
            var list = Enumerable.Range(1, 25).ToList();
            var sublists = list.Partition(4, list.Count);

            foreach (var sublist in sublists)
                Console.WriteLine(string.Join(" ", sublist.Select(element => element.ToString())));
        }

        static void Main()
        {
            new Program().run();
        }
    }
}

I guess this would run much more slowly than using Lists, but it would use much less memory.

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • Interesting solution (though it adds all 'additional' items to first block, but its easy to fix thus you are passing blockSize to second method) – Sergey Berezovskiy Jun 17 '14 at 08:20
1
        const int groupSize = 4;

        var items = new []{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25};

        var currentGroupIndex=-1;

        var step1 = items.Select(a =>{
            if (++currentGroupIndex >= groupSize)
                currentGroupIndex = 0;
            return new {Group = currentGroupIndex, Value = a};
        }).ToArray();


        var step2 = step1.GroupBy(a => a.Group).Select(a => a.ToArray()).ToArray();

        var group1 = step2[0].Select(a => a.Value).ToArray();
        var group2 = step2[1].Select(a => a.Value).ToArray();
        var group3 = step2[2].Select(a => a.Value).ToArray();
        var group4 = step2[3].Select(a => a.Value).ToArray();

What this does is it first introduces an counter (currentGroupIndex) which starts at zero and will be incremented for each element in the list. The index gets reset to zero when the group size has reached.
the variable step1 now contains items continaing a Group and a Value property.
The Group value is then used in the GroupBy statement.

AcidJunkie
  • 1,878
  • 18
  • 21
0

Take & Skip could be very helpful here i think but personally i like using Func to make these choices, makes the method more flexible.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace Splitter
{
    class Program
    {
        static void Main(string[] args)
        {
            List<int> numbers = Enumerable.Range(1, 25).ToList();
            int groupCount = 4;

            var lists = numbers.Groupem(groupCount, (e, i) =>
            {
                // In what group do i wanna have this element.
                int divider = numbers.Count / groupCount;
                int overflow = numbers.Count % divider;
                int index = (i - overflow) / divider;
                return index < 0 ? 0 : index;
            });

            Console.WriteLine("numbers: {0}", numbers.ShowContent());

            Console.WriteLine("Parts");
            foreach (IEnumerable<int> list in lists)
            {
                Console.WriteLine("{0}", list.ShowContent());
            }
        }
    }

    public static class EnumerableExtensions
    {
        private static List<T>[] CreateGroups<T>(int size)
        {
            List<T>[] groups = new List<T>[size];

            for (int i = 0; i < groups.Length; i++)
            {
                groups[i] = new List<T>();
            }

            return groups;
        }

        public static void Each<T>(this IEnumerable<T> source, Action<T, int> action)
        {
            var i = 0;
            foreach (var e in source) action(e, i++);
        }

        public static IEnumerable<IEnumerable<T>> Groupem<T>(this IEnumerable<T> source, int groupCount, Func<T, int, int> groupPicker, bool failOnOutOfBounds = true)
        {
            if (groupCount <= 0) throw new ArgumentOutOfRangeException("groupCount", "groupCount must be a integer greater than zero.");

            List<T>[] groups = CreateGroups<T>(groupCount);

            source.Each((element, index) =>
            {
                int groupIndex = groupPicker(element, index);

                if (groupIndex < 0 || groupIndex >= groups.Length)
                {
                    // When allowing some elements to fall out, set failOnOutOfBounds to false
                    if (failOnOutOfBounds)
                    {
                        throw new Exception("Some better exception than this");
                    }
                }
                else
                {
                    groups[groupIndex].Add(element);
                }
            });

            return groups;
        }

        public static string ShowContent<T>(this IEnumerable<T> list)
        {
            return "[" + string.Join(", ", list) + "]";
        }
    }
}
SvenL
  • 739
  • 7
  • 20
0

How about this, includes parameter checking, works with an emtpy set.

Completes in two passes, should be fast, I haven't tested.

public static IList<Ilist<T>> Segment<T>(
        this IEnumerable<T> source,
        int segments)
{
    if (segments < 1)
    {
        throw new ArgumentOutOfRangeException("segments");
    }

    var list = source.ToList();
    var result = new IList<T>[segments];

    // In case the source is empty.
    if (!list.Any())
    {
        for (var i = 0; i < segments; i++)
        {
            result[i] = new T[0];
        }

        return result;
    }

    int remainder;
    var segmentSize = Math.DivRem(list.Count, segments, out remainder);
    var postion = 0;
    var segment = 0;
    while (segment < segments)
    {
        var count = segmentSize;
        if (remainder > 0)
        {
            remainder--;
            count++;
        }

        result[segment] = list.GetRange(position, count);
        segment++;
        position += count;
    }

    return result;
}
Jodrell
  • 34,946
  • 5
  • 87
  • 124
0

Here is an optimized and lightweight O(N) extension method solution

public static void Bifurcate<T>(this IEnumerable<T> _list, int amountOfListsOutputted, IList<IList<T>> outLists)
{
    var list = _list;

    var index = 0;
    outLists = new List<IList<T>>(amountOfListsOutputted);

    for (int i = 0; i < amountOfListsOutputted; i++)
    {
        outLists.Add(new List<T>());
    }

    foreach (var item in list)
    {
        outLists[index % amountOfListsOutputted].Add(item);

        ++index;
    }
}

Simply use it like this:

public static void Main()
{
    var list = new List<int>(1000);


    //Split into 2

    list.Bifurcate(2, out var twoLists);

    var splitOne = twoLists[0];
    var splitTwo = twoLists[1];

    // splitOne.Count == 500
    // splitTwo.Count == 500


    //Split into 3

    list.Bifurcate(3, out var threeLists);

    var _splitOne = twoLists[0];
    var _splitTwo = twoLists[1];
    var _splitThree = twoLists[2];

    // _splitOne.Count == 334
    // _splitTwo.Count = 333
    // _splitThree.Count == 333
}
Leaves
  • 35
  • 1
  • 9