Get all split options to k chunks

Question

I need to split an array arr into k chunks where the union of all the chucks is arr and there in no same element in two chunks.

For example for

int[] arr = new int[] {1, 2, 3, 4, 5}; 
int k = 3;

I need to return all the possible splits:

[[1], [2], [3,4,5]]
[[1], [2,3], [4,5]]
[[1], [2,3,4], [5]]
[[1,2], [3], [4,5]]
[[1,2], [3,4], [5]]
[[1,2,3], [4], [5]]

How can I do that efficiently in C#?

does this answer your question? https://stackoverflow.com/questions/13731796/create-batches-in-linq — MC LinkTimeError, Jan 04 '23 at 15:27
I will say I will be easier to compute the cut index/ nb element. There is classical leet challenge to get all int combinaison that sums up to a target. And use Skip(). Take to get the chunk — Drag and Drop, Jan 04 '23 at 15:31
You have to choose `k - 1` splitting points from `arr.Length - 1` ones — Dmitry Bychenko, Jan 04 '23 at 15:44
@MCLinkTimeError no, I saw that. They choose the chunk size, not the number of chunks. — nrofis, Jan 04 '23 at 15:53
DragandDrop DmitryBychenko, I thought about that, but how can I do that? — nrofis, Jan 04 '23 at 15:56
@nrofis: You have a standard combinatoric problem: "take k - 1 from n - 1 unordered without repetitions"; well, let me provide some code (please, see my answer) — Dmitry Bychenko, Jan 04 '23 at 17:00

Dmitry Bychenko · Accepted Answer · 2023-01-04T19:22:18.920

You have a combinatoric problem: given an array of n item you should sample k subarrays or, put it differently, k - 1 splits from n - 1:

  [A, B, C, D, E]    n items, n - 1 possible splits 
     ^     ^ 
     |     |         k - 1 splits from n - 1 avaialable
     |     |
  [A] [B, C] [D, E]  k chunks

Note that we have standard combinatoric problem

k - 1 from n - 1 unordered without repetitions

Code for such sampling can be

private static IEnumerable<int[]> Samples(int take, int from) {
  if (take > from || take <= 0 || from <= 0)
    yield break;

  int[] array = Enumerable.Range(0, take).ToArray();

  for (bool agenda = true; agenda; ) {
    agenda = false;

    yield return array.ToArray();

    for (int i = array.Length - 1; i >= 0; --i) 
      if (array[i] < from - take + i) {
        agenda = true;

        array[i] += 1;

        for (int j = i + 1; j < array.Length; ++j)
          array[j] = array[i] + j - i;

        break;
      }
  }
}

Having this sampling routine, we can implement splitting into chunks:

private static IEnumerable<T[][]> MyChunks<T>(T[] array, int take) {
  if (take > array.Length)
    yield break;

  foreach (var sample in Samples(take - 1, array.Length - 1)) {
    T[][] result = new T[take][];

    for (int i = 0, from = 0, to; i <= sample.Length; ++i, from = to) {
      to = i < sample.Length ? sample[i] + 1 : array.Length;

      result[i] = array
        .Skip(from)
        .Take(to - from)
        .ToArray();
    }

    yield return result;
  }
}

Demo:

var arr = new int[] { 1, 2, 3, 4, 5};
int k = 3;

string report = string.Join(Environment.NewLine, MyChunks(arr, k)
  .Select(chunk => "[" + string.Join(", ", chunk
     .Select(item => $"[{string.Join(",", item)}]")) + "]"));

Console.Write(report);

Output:

[[1], [2], [3,4,5]]
[[1], [2,3], [4,5]]
[[1], [2,3,4], [5]]
[[1,2], [3], [4,5]]
[[1,2], [3,4], [5]]
[[1,2,3], [4], [5]]

Good Night Nerd Pride · Answer 2 · 2023-01-05T12:19:23.383

On way to solve such a problem is to divide and conquer. We could first solve how we compute the possible splits of an array if we only ever wanted to split into two sub arrays (k = 2).

I.e. our function, when given 1,2,3,4, should return 1|2,3,4, 1,2|3,4, and 1,2,3|4 where | marks the border between the left and right subarrays.

Note how the | starts at the left-most position (that still produces a non-empty split on the left) and gradually moves to the right, until no more non-empty split can be produced for the right.

A C# function that does this is shown below:

IEnumerable<(Memory<int>, Memory<int>)> PossibleSplits(
    Memory<int> xs) {
    for (var i = 1; i < xs.Length; i++)
        yield return (xs[..i], xs[i..]);
}

var splits = PossibleSplits(new[] { 1, 2, 3, 4, 5 }.AsMemory());

As you can see it returns a sequence of left/right tuples.

I'm using Memory so no new arrays are allocated when splitting the input data.

One nice property of this function is that it returns an empty sequence when the input array's length is smaller than 2.

So how do we split to an abritrary number of splits, not only 2? One trick is to recursively split the left side of a split again until the left side becomes to small to be split.

To that end we have to modify the above function in several ways:

The return value must change from a sequence of Memory tuples to a sequence of Memory sequences.
i cannot always start at position 1. When splitting into an arbitrary number k of splits, our function must make sure that the left side always holds enough elements for it to be split again into at least k - 1 subarrays. Therefor i must start at k - 1.
Obviously the function needs to call itself to become recursive. And it must do so with one of the two subarrays (we choose left) and the predecssor of k. Decrementing k accounts for the other subarray that won't be split up any further, but will be returned as part of the result of course.
We must add an exit condition to break the recursive cycle. When k is below 2 then we cannot meaningfully split the array (regardless of its size) and just return the input array wrapped as a singleton sequence.

Below is a recursive version of the function above that does just that:

public static class Ext {
    public static IEnumerable<IEnumerable<Memory<int>>> PossibleSplits(
        this Memory<int> xs,
        int k) {
        if (k < 2) {
            yield return Enumerable.Repeat(xs, 1);
            yield break;
        }
        
        for (var i = k - 1; i < xs.Length; i++) {
            var (left, right) = (xs[..i], xs[i..]);
            foreach (var split in left.PossibleSplits(k - 1))
                yield return split.Append(right);
        }
    }
}

var splits = new[] { 1, 2, 3, 4, 5 }
    .AsMemory()
    .PossibleSplits(k: 3);

It's an extension method, just because I like them.

You asked for an efficient solution, but efficiency is relative. This solution is efficient in terms of...

...memory because now new arrays are allocated (hush, we don't talk about all the enumerators created by this code);
...laziness, because only requested values will be computed;
...code size.

It's not the most efficient in terms of runtime speed, because of all the overhead enumerators introduce.

Get all split options to k chunks

2 Answers2