How to split list in evenly random chunks?

Question

sizes :: StdGen -> Int -> Int -> [Int]
sizes g c 0 = []
sizes g c b = a : (sizes g' (c - a) (b - 1)) where
    (a, g') = randomR (1, c - b) g   

main = do
    g <- newStdGen
    print $ sizes g 100 10

I've gotten this problem down to problem of producing list of random numbers that sums up into some number.

This code basically works, but numbers doesn't seem to be evenly distributed (for example [7,46,3,4,26,8,1,2,1,1]).

They tend to be big at the start and then getting smaller in the end. The reason for it quite obvious from the code.

What I'd like to have is something like [8, 6, 10, 4, 15..] where the numbers aren't that different from each other.

It's not totally clear to me what you're asking yet. Let me propose a few things you could be asking, and see what you think. You could be asking to draw uniformly at random from among the set of [partitions](https://en.wikipedia.org/wiki/Partition_(number_theory)) of a given number and of a given size. Or you could be asking to draw uniformly at random from among the set of ways to put a given number of (unlabeled) balls into a given number of (labeled) boxes. (...or the other three variations with other choices of labeled/unlabeled for boxes and balls.) — Daniel Wagner, Oct 11 '17 at 17:19
See https://stackoverflow.com/a/31814348/2166798 for a possible answer. The implementation is Python, but the algorithm is straightforward. — pjs, Oct 11 '17 at 17:24
(Come to think of it, probably selecting uniformly from among partitions is the same as selecting uniformly from among ways to put unlabeled balls into unlabeled boxes, up to some rejiggering of the parameters.) — Daniel Wagner, Oct 11 '17 at 17:25
@pjs Yes, that's what I wanted. Can you explain why differences add up to `n`? — user1685095, Oct 11 '17 at 18:13
@user1685095 Because the total length of the range [0, n] is n, and the differences are like the lengths of sticks (or strings, or rulers), which span the total range when placed end-to-end. — pjs, Oct 11 '17 at 18:29
Concrete example: I want 4 values that add to 20. I generate 3 random numbers to split the range into 4 intervals, perhaps (18, 7, 12). I add 0 and 20 and sort, to get (0, 7, 12, 18, 20). The successive differences are (7, 5, 6, 2). Voila, there are 4 of them and they sum to 20! — pjs, Oct 11 '17 at 18:34
Another way to think of the prior example is that I'm summing (7 - 0) + (12 - 7) + (18 - 12) + (20 - 18). A moment's inspection will show that all the values other than 0 and 20 occur with both plus and minus signs, so they cancel out, leaving you with the total = (20 - 0) = 20. — pjs, Oct 11 '17 at 18:42
@pjs I guess if you would make an answer with Haskell version I would accept it. Thanks for you help! — user1685095, Oct 12 '17 at 07:37
Since I don't know Haskell, that would be a challenge. Glad if that helped though. — pjs, Oct 12 '17 at 13:13

Mark Seemann · Answer 1 · 2017-10-11T11:43:25.550

It's not quite clear to me what you want to do, but if you want a uniform distribution between a minimum and a maximum, you can use randomRs:

Prelude> :m +System.Random
Prelude System.Random> g <- newStdGen
Prelude System.Random> take 10 $ randomRs (10, 100) g
[48,93,21,50,84,57,25,80,68,18]

If you want these random numbers to sum to a particular number, you can basically start picking from the left until you get close enough. The inits function could help you with that:

Prelude System.Random> :m +Data.List
Prelude System.Random Data.List> take 10 $ inits $ randomRs (10, 100) g
[[],[48],[48,93],[48,93,21],[48,93,21,50],[48,93,21,50,84],[48,93,21,50,84,57],
[48,93,21,50,84,57,25],[48,93,21,50,84,57,25,80],[48,93,21,50,84,57,25,80,68]]

Instead of take 10, you could start going through this list of lists until you find one that's close enough. For instance, you could calculate the sum of all of those lists:

Prelude System.Random Data.List> fmap sum $ take 10 $ inits $ randomRs (10, 100) g
[0,48,141,162,212,296,353,378,458,526]

So, if you're aiming for, say, 500, you can see that the ninth sum is 458, whereas the tenth sum is too high. In other words, the first nine numbers will get you to 458. How will you reach 500?

One option is simply to to say that then the last number has to be 500 - 458 = 42, but then I'm not sure that the distribution counts as perfectly uniform any longer, because the last number is deterministic.

Another option would be to keep generating random numbers until you have a sequence that's a perfect fit.

Since I don't know the exact requirements, I can't advice on which way would be best.

In the above example, I used fmap sum to illustrate my point. The problem with this is that by doing this, you throw away the numbers that generated the sum. As far as I understand, you actually want those numbers, so you'll probably need a more complicated left fold that both calculates the sum, while still remembering the numbers that produced it. You can use foldl or foldl' for that.

What I want is the second answer to this question https://stackoverflow.com/questions/2640053/getting-n-random-numbers-that-the-sum-is-m adapted to work with integers instead of floating point numbers. — user1685095, Oct 11 '17 at 11:24
I'm not sure how can I make this clearer `producing list of random numbers that sums up into some number`. For example let's say I have number `10` and I need three integers that would sum up to this number `[3, 3, 4]` — user1685095, Oct 11 '17 at 11:26
I'm sorry, but this is a very bad solution. It takes a lot of time to compute even for length of capacity of `100`. The solution from my link is much better. Can you help to adapt it to haskell and Integers? I think about splitting floating points into integer and fractional part and them somehow distribute this fractional parts so that the result would sum equally to given number. — user1685095, Oct 11 '17 at 12:18

How to split list in evenly random chunks?

1 Answers1