49

Afternoon,

I need to split an array into smaller "chunks".

I am passing over about 1200 items, and need to split these into easier to handle arrays of 100 items each, which I then need to process.

Could anyone please make some suggestions?

Colin Emonds
  • 765
  • 4
  • 18
thatuxguy
  • 2,418
  • 7
  • 30
  • 51
  • 3
    http://stackoverflow.com/a/733261/138772 – JAB Jun 26 '12 at 12:42
  • Well i need to pass over 1200+ items to Amazon MWS, now their API only allows 100 at a time, so to do this i need to split the array up.I am passing in the array using - string[] amzProductAsins = GetProductAsin(); - which is getting the ASIN's from my database. and creating an array :) – thatuxguy Jun 26 '12 at 12:53
  • @ChrisGessler i am looking for the best, or most efficient way. As i said there are 1200+ products at present in the DB, i need to try and get the lowest price using the Amazon MWS API, which as i said only work in batches of 100 (Max) currently my code is failing as i am passing 1241 products at once lol :) – thatuxguy Jun 26 '12 at 14:57
  • 1
    @thatuxguy - If performance is a top priority, I suggest either Array.Copy or ArraySegment<>. See my answer for details on both + performance results. – Chris Gessler Jun 26 '12 at 15:07
  • 1
    .NET 6 new linq Chunk() method: https://stackoverflow.com/a/69625204/379279 – xhafan Feb 15 '22 at 09:42

9 Answers9

84

Array.Copy has been around since 1.1 and does an excellent job of chunking arrays.

List.GetRange() would also be a good choice as mentioned in another answer.

string[] buffer;

for(int i = 0; i < source.Length; i+=100)
{
    buffer = new string[100];
    Array.Copy(source, i, buffer, 0, 100);
    // process array
}

And to make an extension for it:

public static class Extensions
{
    public static T[] Slice<T>(this T[] source, int index, int length)
    {       
        T[] slice = new T[length];
        Array.Copy(source, index, slice, 0, length);
        return slice;
    }
}

And to use the extension:

string[] source = new string[] { 1200 items here };

// get the first 100
string[] slice = source.Slice(0, 100);

Update: I think you might be wanting ArraySegment<> No need for performance checks, because it simply uses the original array as its source and maintains an Offset and Count property to determine the 'segment'. Unfortunately, there isn't a way to retrieve JUST the segment as an array, so some folks have written wrappers for it, like here: ArraySegment - Returning the actual segment C#

ArraySegment<string> segment;
           
for (int i = 0; i < source.Length; i += 100)
{
    segment = new ArraySegment<string>(source, i, 100);

    // and to loop through the segment
    for (int s = segment.Offset; s < segment.Array.Length; s++)
    {
        Console.WriteLine(segment.Array[s]);
    }
}

Performance of Array.Copy vs Skip/Take vs LINQ

Test method (in Release mode):

static void Main(string[] args)
{
    string[] source = new string[1000000];
    for (int i = 0; i < source.Length; i++)
    {
        source[i] = "string " + i.ToString();
    }

    string[] buffer;

    Console.WriteLine("Starting stop watch");

    Stopwatch sw = new Stopwatch();

    for (int n = 0; n < 5; n++)
    {
        sw.Reset();
        sw.Start();
        for (int i = 0; i < source.Length; i += 100)
        {
            buffer = new string[100];
            Array.Copy(source, i, buffer, 0, 100);
        }

        sw.Stop();
        Console.WriteLine("Array.Copy: " + sw.ElapsedMilliseconds.ToString());

        sw.Reset();
        sw.Start();
        for (int i = 0; i < source.Length; i += 100)
        {
            buffer = new string[100];
            buffer = source.Skip(i).Take(100).ToArray();
        }
        sw.Stop();
        Console.WriteLine("Skip/Take: " + sw.ElapsedMilliseconds.ToString());

        sw.Reset();
        sw.Start();
        String[][] chunks = source                            
            .Select((s, i) => new { Value = s, Index = i })                            
            .GroupBy(x => x.Index / 100)                            
            .Select(grp => grp.Select(x => x.Value).ToArray())                            
            .ToArray();
        sw.Stop();
        Console.WriteLine("LINQ: " + sw.ElapsedMilliseconds.ToString());
    }
    Console.ReadLine();
}

Results (in milliseconds):

Array.Copy:    15
Skip/Take:  42464
LINQ:         881

Array.Copy:    21
Skip/Take:  42284
LINQ:         585

Array.Copy:    11
Skip/Take:  43223
LINQ:         760

Array.Copy:     9
Skip/Take:  42842
LINQ:         525

Array.Copy:    24
Skip/Take:  43134
LINQ:         638
Chris Gessler
  • 22,727
  • 7
  • 57
  • 83
  • @psubsee2003 - thanks. And my guess is that it will out perform the Skip/Take solution as well, especially considering that .ToArray() has to be called. – Chris Gessler Jun 26 '12 at 13:24
  • In your Skip(), Take() example you have an extra array allocation that need not be there. – James Michael Hare Mar 06 '13 at 16:56
  • @JamesMichaelHare - nice catch! copy/paste issue. I'll take it out and rerun the tests, but I doubt it will make much of a difference. – Chris Gessler Mar 07 '13 at 00:57
  • @JamesMichaelHare - reran the test - same results. Keep in mind, this test breaks up a million element array, not likely something that will be done in the real world too often. When I lower the number to 10,000 elements, the percentage difference goes down more in line with LINQ, 10 - 20,000% slower. – Chris Gessler Mar 07 '13 at 02:51
  • @ChrisGessler Yeah, i didn't expect it to make a big difference. I think in this case the problem is that Skip() is not appropriate for this particular use, because you end up re-iterating over the array multiple times. So I would not be in favor of using `Skip()` in a loop like this, as it becomes an extra O(n^2) operation that just isn't necessary at all. – James Michael Hare Mar 07 '13 at 17:22
  • 1
    @ChrisGessler - In fact, this seems like such a common use case I'd argue it could be given its own dedicated LINQ extension method (named something like `Slice()`). – James Michael Hare Mar 07 '13 at 17:33
  • Thanks, I went with the Array.Copy() solution. I made a performance benchmark and for the purpose that I need it it is amazing : ) thanks – badjuice Oct 27 '21 at 18:10
  • i get an error if the source array length isn't exactly divisible by the chunk size. why isn't this accounted for? – jpro Oct 18 '22 at 06:39
52

You can use LINQ to group all items by the chunk size and create new Arrays afterwards.

// build sample data with 1200 Strings
string[] items = Enumerable.Range(1, 1200).Select(i => "Item" + i).ToArray();
// split on groups with each 100 items
String[][] chunks = items
                    .Select((s, i) => new { Value = s, Index = i })
                    .GroupBy(x => x.Index / 100)
                    .Select(grp => grp.Select(x => x.Value).ToArray())
                    .ToArray();

for (int i = 0; i < chunks.Length; i++)
{
    foreach (var item in chunks[i])
        Console.WriteLine("chunk:{0} {1}", i, item);
}

Note that it's not necessary to create new arrays(needs cpu cycles and memory). You could also use the IEnumerable<IEnumerable<String>> when you omit the two ToArrays.

Here's the running code: http://ideone.com/K7Hn2

Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • 3
    LINQ seems pretty useful for using relational database techniques without actually needing such a database. – JAB Jun 26 '12 at 13:11
  • This seems pretty good, so once i have them split into groups of 100 i can pass them to Amazon's MWS for processing? – thatuxguy Jun 26 '12 at 14:55
20

here I found another linq-solution:

int[] source = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
int i = 0;
int chunkSize = 3;
int[][] result = source.GroupBy(s => i++ / chunkSize).Select(g => g.ToArray()).ToArray();

//result = [1,2,3][4,5,6][7,8,9]
fubo
  • 44,811
  • 17
  • 103
  • 137
11

You can use Skip() and Take()

string[] items = new string[]{ "a", "b", "c"};
string[] chunk = items.Skip(1).Take(1).ToArray();
Asif Mushtaq
  • 13,010
  • 3
  • 33
  • 42
8
    string[]  amzProductAsins = GetProductAsin();;
    List<string[]> chunks = new List<string[]>();
    for (int i = 0; i < amzProductAsins.Count; i += 100)
    {
        chunks.Add(amzProductAsins.Skip(i).Take(100).ToArray());
    }
Habib
  • 219,104
  • 29
  • 407
  • 436
  • @Habib.OSU I already have a string of 1200+ items... how would i use this on - string[] amzProductAsins = GetProductAsin(); – thatuxguy Jun 26 '12 at 12:59
  • still has < 1200 is this correct? maybe it needs to count them first i am guessing :D – thatuxguy Jun 26 '12 at 14:55
  • Skip().Take().ToArray() is VERY slow compared to Array.Copy – Chris Gessler Jun 27 '12 at 13:10
  • @ChrisGessler, You are right, but Skip().Take() is one of the possible ways to do it, and that was the first thing which came in my mind after seeing the question. – Habib Jun 28 '12 at 04:56
  • I agree that it's one possible way, but the question is "Best way to split an array", which requires a little more thought and research. – Chris Gessler Jun 28 '12 at 12:03
5

You can use List.GetRange:

for(var i = 0; i < source.Count; i += chunkSize)
{
    List<string> items = source.GetRange(i, Math.Min(chunkSize, source.Count - i));
}

Although not at fast as Array.Copy, I think it looks cleaner:

var list = Enumerable.Range(0, 723748).ToList();

var stopwatch = new Stopwatch();

for (int n = 0; n < 5; n++)
{
    stopwatch.Reset();
    stopwatch.Start();
    for(int i = 0; i < list.Count; i += 100)
    {
        List<int> c = list.GetRange(i, Math.Min(100, list.Count - i));
    }
    stopwatch.Stop();
    Console.WriteLine("List<T>.GetRange: " + stopwatch.ElapsedMilliseconds.ToString());

    stopwatch.Reset();
    stopwatch.Start();
    for (int i = 0; i < list.Count; i += 100)
    {
        List<int> c = list.Skip(i).Take(100).ToList();
    }
    stopwatch.Stop();
    Console.WriteLine("Skip/Take: " + stopwatch.ElapsedMilliseconds.ToString());

    stopwatch.Reset();
    stopwatch.Start();
    var test = list.ToArray();
    for (int i = 0; i < list.Count; i += 100)
    {
        int length = Math.Min(100, list.Count - i);
        int[] c = new int[length];
        Array.Copy(test, i, c, 0, length);
    }
    stopwatch.Stop();
    Console.WriteLine("Array.Copy: " + stopwatch.ElapsedMilliseconds.ToString());

    stopwatch.Reset();
    stopwatch.Start();
    List<List<int>> chunks = list
        .Select((s, i) => new { Value = s, Index = i })
        .GroupBy(x => x.Index / 100)
        .Select(grp => grp.Select(x => x.Value).ToList())
        .ToList();
    stopwatch.Stop();
    Console.WriteLine("LINQ: " + stopwatch.ElapsedMilliseconds.ToString());
}

Results in milliseconds:

List<T>.GetRange: 1
Skip/Take: 9820
Array.Copy: 1
LINQ: 161

List<T>.GetRange: 9
Skip/Take: 9237
Array.Copy: 1
LINQ: 148

List<T>.GetRange: 5
Skip/Take: 9470
Array.Copy: 1
LINQ: 186

List<T>.GetRange: 0
Skip/Take: 9498
Array.Copy: 1
LINQ: 110

List<T>.GetRange: 8
Skip/Take: 9717
Array.Copy: 1
LINQ: 148
Daniel
  • 8,655
  • 5
  • 60
  • 87
1

Use LINQ, you can use Take() and Skip() functions

fenix2222
  • 4,602
  • 4
  • 33
  • 56
1

General recursive extension method:

    public static IEnumerable<IEnumerable<T>> SplitList<T>(this IEnumerable<T> source, int maxPerList)
    {
        var enumerable = source as IList<T> ?? source.ToList();
        if (!enumerable.Any())
        {
            return new List<IEnumerable<T>>();
        }
        return (new List<IEnumerable<T>>() { enumerable.Take(maxPerList) }).Concat(enumerable.Skip(maxPerList).SplitList<T>(maxPerList));
    }
Joel
  • 6,193
  • 6
  • 22
  • 22
-2

If you have an array to be divided but the division has the rest with this simple solution you can share the elements missing into the various "chunks" equally.

int[] arrInput = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 };
var result = SplitArrey(arrInput, 5);
foreach (var item in result) {
  Console.WriteLine("   {0}", String.Join(" ", item));
}

the function is:

public static List<int[]> SplitArrey(int[] arrInput, int nColumn) {

        List<int[]> result = new List<int[]>(nColumn);
    
        int itemsForColum = arrInput.Length / nColumn;  
        int countSpareElement = arrInput.Length - (itemsForColum * nColumn);    

        // Add and extra space for the spare element
        int[] newColumLenght = new int[nColumn];
        for (int i = 0; i < nColumn; i++)
        {
            int addOne = (i < countSpareElement) ? 1 : 0;
            newColumLenght[i] = itemsForColum + addOne;
            result.Add(new int[itemsForColum + addOne]);
        }

        // Copy the values
        int offset = 0;
        for (int i = 0; i < nColumn; i++)
        {
            int count_items_to_copy = newColumLenght[i];
            Array.Copy(arrInput, offset, result[i], 0, count_items_to_copy);
            offset += newColumLenght[i];
        }
        return result;
    }

the result is:

1 2 3
4 5 6
7 8
9 10
11 12
Francesco
  • 47
  • 1
  • 4
  • public static T[] Slice(this T[] source, int index, int length) { int delta = source.Length - (index + length); int actualLength = delta >= 0 ? length : length + delta; // Reduce length when index + length > source.Length T[] slice = new T[actualLength]; Array.Copy(source, index, slice, 0, actualLength); return slice; } – user1920925 Nov 06 '17 at 09:22
  • 1
    Too Complex for a straight simple thing – Muhammad Faizan Khatri Nov 02 '21 at 09:54