Split a list into n randomly sized chunks

Question

I am trying to split a list into n sublists where the size of each sublist is random (with at least one entry; assume P>I). I used numpy.split function which works fine but does not satisfy my randomness condition. You may ask which distribution the randomness should follow. I think, it should not matter. I checked several posts which were not equivalent to my post as they were trying to split with almost equally sized chunks. If duplicate, let me know. Here is my approach:

import numpy as np

P = 10
I = 5
mylist = range(1, P + 1)
[list(x) for x in np.split(np.array(mylist), I)]

This approach collapses when P is not divisible by I. Further, it creates equal sized chunks, not probabilistically sized chunks. Another constraint: I do not want to use the package random but I am fine with numpy. Don't ask me why; I wish I had a logical response for it.

Based on the answer provided by the mad scientist, this is the code I tried:

P = 10
I = 5

data = np.arange(P) + 1
indices = np.arange(1, P)
np.random.shuffle(indices)
indices = indices[:I - 1]
result = np.split(data, indices)
result

Output:

[array([1, 2]),
 array([3, 4, 5, 6]),
 array([], dtype=int32),
 array([4, 5, 6, 7, 8, 9]),
 array([10])]

Why do you insist on everything being a list if you're ok with numpy? — Mad Physicist, Sep 12 '19 at 23:34
This seems like a [Stars and bars problem](https://en.wikipedia.org/wiki/Stars_and_bars_(combinatorics)); divide P stars into I bins where each bin gets at least one element. — Chris, Sep 12 '19 at 23:45
Possible duplicate of [General bars and stars](https://stackoverflow.com/questions/28965734/general-bars-and-stars) — Chris, Sep 12 '19 at 23:45
@Chris I tried the function Kevin provided in there. But, I could not see any relationship. — tcokyasar, Sep 13 '19 at 00:08

score 3 · Answer 1 · answered Sep 13 '19 at 01:05

3

The problem can be refactored as choosing I-1 random split points from {1,2,...,P-1}, which can be viewed using stars and bars.

Therefore, it can be implemented as follows:

import numpy as np

split_points = np.random.choice(P - 2, I - 1, replace=False) + 1
split_points.sort()
result = np.split(data, split_points)

answered Sep 13 '19 at 01:05

GZ0

4,055
1
10
21

I'm glad you added this example. – Mad Physicist Sep 13 '19 at 01:06

Mad Physicist · Accepted Answer · 2019-09-13T00:49:33.400

2

np.split is still the way to go. If you pass in a sequence of integers, split will treat them as cut points. Generating random cut points is easy. You can do something like

P = 10
I = 5

data = np.arange(P) + 1
indices = np.random.randint(P, size=I - 1)

You want I - 1 cut points to get I chunks. The indices need to be sorted, and duplicates need to be removed. np.unique does both for you. You may end up with fewer than I chunks this way:

result = np.split(data, indices)

If you absolutely need to have I numbers, choose without resampling. That can be implemented for example via np.shuffle:

indices = np.arange(1, P)
np.random.shuffle(indices)
indices = indices[:I - 1]
indices.sort()

edited Sep 13 '19 at 00:49

answered Sep 12 '19 at 23:45

Mad Physicist

107,652
25
181
264

You can try `P=11` and see that np.split does not work. – tcokyasar Sep 13 '19 at 00:01
Further, np.split does not ensure len of each sublist (or sub -array) to be random/ Adversely, it creates equally len sub-arrays. Correct me if I am wrong. – tcokyasar Sep 13 '19 at 00:03
I did have a typo. The second argument should have been `indices`, not `I`. Fixed now. The behavior of `split` is very dependent on the second argument. – Mad Physicist Sep 13 '19 at 00:14
`np.choice` is better suited for that task despite `np.shuffle` would also work. – GZ0 Sep 13 '19 at 00:18
@GZ0. I'm on a mobile platform, and that's the first thing that came to mind. You're absolutely right thought, it's much better suited. – Mad Physicist Sep 13 '19 at 00:19
I updated my post. I still could not get what I want to have. I see some indices twice in different sub-arrays and some empty sub-arrays. @GZ0 how would you implement np.choice? Could you please elaborate? – tcokyasar Sep 13 '19 at 00:37
@user8028576. You're right, I forgot to sort the indices. They must be increasing monotonically. Fixed – Mad Physicist Sep 13 '19 at 00:49
I think this is happening because you are on mobile platform and cannot see the outputs. – tcokyasar Sep 13 '19 at 00:50
There you go! Thank you so much! – tcokyasar Sep 13 '19 at 00:53
2

@user8028576. I think you're right. Glad you finally got it working. I suggest you spend some time reading through the links I put in the answer and understanding exactly *why* it works. – Mad Physicist Sep 13 '19 at 00:54

Split a list into n randomly sized chunks

2 Answers2

Linked