0

I need to split an array arr into k chunks where the union of all the chucks is arr and there in no same element in two chunks.

For example for

int[] arr = new int[] {1, 2, 3, 4, 5}; 
int k = 3;

I need to return all the possible splits:

[[1], [2], [3,4,5]]
[[1], [2,3], [4,5]]
[[1], [2,3,4], [5]]
[[1,2], [3], [4,5]]
[[1,2], [3,4], [5]]
[[1,2,3], [4], [5]]

How can I do that efficiently in C#?

Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
nrofis
  • 8,975
  • 14
  • 58
  • 113

2 Answers2

1

You have a combinatoric problem: given an array of n item you should sample k subarrays or, put it differently, k - 1 splits from n - 1:

  [A, B, C, D, E]    n items, n - 1 possible splits 
     ^     ^ 
     |     |         k - 1 splits from n - 1 avaialable
     |     |
  [A] [B, C] [D, E]  k chunks   

Note that we have standard combinatoric problem

k - 1 from n - 1 unordered without repetitions

Code for such sampling can be

private static IEnumerable<int[]> Samples(int take, int from) {
  if (take > from || take <= 0 || from <= 0)
    yield break;

  int[] array = Enumerable.Range(0, take).ToArray();

  for (bool agenda = true; agenda; ) {
    agenda = false;

    yield return array.ToArray();

    for (int i = array.Length - 1; i >= 0; --i) 
      if (array[i] < from - take + i) {
        agenda = true;

        array[i] += 1;

        for (int j = i + 1; j < array.Length; ++j)
          array[j] = array[i] + j - i;

        break;
      }
  }
}

Having this sampling routine, we can implement splitting into chunks:

private static IEnumerable<T[][]> MyChunks<T>(T[] array, int take) {
  if (take > array.Length)
    yield break;

  foreach (var sample in Samples(take - 1, array.Length - 1)) {
    T[][] result = new T[take][];

    for (int i = 0, from = 0, to; i <= sample.Length; ++i, from = to) {
      to = i < sample.Length ? sample[i] + 1 : array.Length;

      result[i] = array
        .Skip(from)
        .Take(to - from)
        .ToArray();
    }

    yield return result;
  }
}

Demo:

var arr = new int[] { 1, 2, 3, 4, 5};
int k = 3;

string report = string.Join(Environment.NewLine, MyChunks(arr, k)
  .Select(chunk => "[" + string.Join(", ", chunk
     .Select(item => $"[{string.Join(",", item)}]")) + "]"));

Console.Write(report);

Output:

[[1], [2], [3,4,5]]
[[1], [2,3], [4,5]]
[[1], [2,3,4], [5]]
[[1,2], [3], [4,5]]
[[1,2], [3,4], [5]]
[[1,2,3], [4], [5]]
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
0

On way to solve such a problem is to divide and conquer. We could first solve how we compute the possible splits of an array if we only ever wanted to split into two sub arrays (k = 2).

I.e. our function, when given 1,2,3,4, should return 1|2,3,4, 1,2|3,4, and 1,2,3|4 where | marks the border between the left and right subarrays.

Note how the | starts at the left-most position (that still produces a non-empty split on the left) and gradually moves to the right, until no more non-empty split can be produced for the right.

A C# function that does this is shown below:

IEnumerable<(Memory<int>, Memory<int>)> PossibleSplits(
    Memory<int> xs) {
    for (var i = 1; i < xs.Length; i++)
        yield return (xs[..i], xs[i..]);
}

var splits = PossibleSplits(new[] { 1, 2, 3, 4, 5 }.AsMemory());

As you can see it returns a sequence of left/right tuples.

I'm using Memory so no new arrays are allocated when splitting the input data.

One nice property of this function is that it returns an empty sequence when the input array's length is smaller than 2.

So how do we split to an abritrary number of splits, not only 2? One trick is to recursively split the left side of a split again until the left side becomes to small to be split.

To that end we have to modify the above function in several ways:

  • The return value must change from a sequence of Memory tuples to a sequence of Memory sequences.
  • i cannot always start at position 1. When splitting into an arbitrary number k of splits, our function must make sure that the left side always holds enough elements for it to be split again into at least k - 1 subarrays. Therefor i must start at k - 1.
  • Obviously the function needs to call itself to become recursive. And it must do so with one of the two subarrays (we choose left) and the predecssor of k. Decrementing k accounts for the other subarray that won't be split up any further, but will be returned as part of the result of course.
  • We must add an exit condition to break the recursive cycle. When k is below 2 then we cannot meaningfully split the array (regardless of its size) and just return the input array wrapped as a singleton sequence.

Below is a recursive version of the function above that does just that:

public static class Ext {
    public static IEnumerable<IEnumerable<Memory<int>>> PossibleSplits(
        this Memory<int> xs,
        int k) {
        if (k < 2) {
            yield return Enumerable.Repeat(xs, 1);
            yield break;
        }
        
        for (var i = k - 1; i < xs.Length; i++) {
            var (left, right) = (xs[..i], xs[i..]);
            foreach (var split in left.PossibleSplits(k - 1))
                yield return split.Append(right);
        }
    }
}

var splits = new[] { 1, 2, 3, 4, 5 }
    .AsMemory()
    .PossibleSplits(k: 3);

It's an extension method, just because I like them.

You asked for an efficient solution, but efficiency is relative. This solution is efficient in terms of...

  • ...memory because now new arrays are allocated (hush, we don't talk about all the enumerators created by this code);
  • ...laziness, because only requested values will be computed;
  • ...code size.

It's not the most efficient in terms of runtime speed, because of all the overhead enumerators introduce.

Good Night Nerd Pride
  • 8,245
  • 4
  • 49
  • 65