First let's do a few imports for later:
import Control.Applicative
import Control.Monad
import System.Random
import Data.List hiding (partition)
Code Fix
Always remember function application has higher precedence than infix operators: return k1 ++ k2
means (return k1) ++ k2
and round prop * n
means (round prop) * n
. You can use $
to separate a function from the expression you're applying it to, because f $ x = f x
but $
has really low precedence. You can use return $ k1 ++ k2
, for example.
You were getting your Ints and Doubles mixed up a bit (round prop * n)
rounds the proportion before multiplying, but you want to multiply first, so you'd need to apply fromIntegral
to the n
. I made a seperate function for this
(.*) :: Double -> Int -> Int
d .* i = floor $ d * fromIntegral i
So now instead of (round prop * n)
you can use (prop .* n)
. It cleans up the code a bit, and means if it's wrong, we can fix it in one function instead of all over the place.
I've provided a type signature to make error messages more informative, and also a second base case - it wasn't terminating because sometimes rounding caused it to ask for a list of length 0.
partition1 :: Int -> Int -> IO [Int]
partition1 0 total = return []
partition1 1 total = return [total]
partition1 n total = do
prop <- randomRIO(0.0, 1.0)
k1 <- partition1 (prop .* n) (prop .* total)
k2 <- partition1 (n - (prop .* n)) (total - (prop .* total))
return $ k1 ++ k2
I also took the liberty of giving it a more descriptive name.
Getting the right totals
Unfortunately, this compiles, but as Will Ness pointed out in comments, there's a glitch: it typically gives you numbers that total to less than the total. It turns out that's because you will call partition 0 n
for non-zero n
, asking for a list of length 0 that sums to something non-zero. Oops.
The idea behind your algorithm is to split the list and the total randomly, but keep the proportions the same for both, to keep the distribution from being one-sided, (the problem in the original question).
Let's use that idea, but prevent it from asking for a length zero - we'll need prop to be neither 0 nor 1.
partition2 :: Int -> Int -> IO [Int]
partition2 0 total = return []
partition2 1 total = return [total]
partition2 n total = do
new_n <- randomRIO(1,n-1)
let prop = fromIntegral new_n / fromIntegral n
k1 <- partition2 new_n (prop .* total)
k2 <- partition2 (n - new_n) (total - (prop .* total))
return $ k1 ++ k2
Now it can never give us the wrong total. Hooray!
Random isn't the same as fair
But oops: partition2 18 10000
gives us
[555,555,555,555,556,556,555,556,556,556,555,555,556,556,555,556,556,556]
The problem is that fair isn't the same as random. This algorithm is very fair, but not very random. Let's let it choose the proportion seperately from the length:
partition3 :: Int -> Int -> IO [Int]
partition3 0 total = return []
partition3 1 total = return [total]
partition3 n total = do
new_n <- randomRIO(1,n-1)
new_total <- randomRIO(0,total) -- it's fine to have zeros.
k1 <- partition3 new_n new_total
k2 <- partition3 (n - new_n) (total - new_total)
return $ k1 ++ k2
That looks better: partition3 15 20000
gave me
[1134,123,317,725,1031,3897,8089,2111,164,911,25,0,126,938,409]
Random isn't fair, but it isn't biased either
This is apparently much better, but essentially the binary partitioning we're doing is introducing bias.
You can test a lot of runs by looking at
check :: (Int -> Int -> IO [Int]) -> Int -> Int -> Int -> IO ()
check f n total times = mapM_ print =<< map average.transpose.map (righttotal total) <$> replicateM times (f n total)
where average xs = fromIntegral (sum xs)/fromIntegral total
righttotal tot xs | sum xs == tot = xs
| otherwise = error $ "wrong total: " ++ show (sum xs)
which on one run of check partition3 11 10000 1000
gave me
180.7627
97.6642
79.7251
66.9267
64.5253
59.4046
56.9186
66.6599
70.6639
88.9945
167.7545
Without going into loads of test data and analysis, interesting though it was, there's a disproportionate amount of 0 when n
isn't a factor of total
, and the distribution isn't even, it's cup shaped - the algorithm ends up cramming data at one end.
A way out
Instead of choosing bit by bit how much is going in sublists, let's generate all the places a subtotal will end all at once. Of course one of them has to be the total, and we'd better sort them once we generate them.
stopgaps :: Int -> Int -> IO [Int]
stopgaps parts total = sort.(total:) <$> replicateM (parts-1) (randomRIO (0,total))
Here I use replicateM :: Int -> m a -> m [a]
to generate parts-1
random numbers in the correct range.
I'd like to plug an unsung hero:
mapAccumL :: (acc -> x -> (acc, y)) -> acc -> [x] -> (acc, [y])
for accumulating along a list, generating a new list.
gapsToLengths :: [Int] -> (Int,[Int])
gapsToLengths = mapAccumL between 0
where between previous new = (new,new - previous)
partition4 :: Int -> Int -> IO [Int]
partition4 parts total = snd.gapsToLengths <$> stopgaps parts total
Does it work?
A few test runs of partition4 11 10000
, pretty printed :
[ 786, 20, 607, 677, 1244, 1137, 990, 50, 1716, 813, 1960]
[ 406, 110, 2556, 126, 1289, 567, 348, 1230, 171, 613, 2584]
[ 368, 1794, 136, 1266, 583, 93, 1514, 66, 1594, 1685, 901]
[ 657, 1296, 1754, 411, 691, 1865, 531, 270, 1941, 286, 298]
[2905, 313, 842, 796, 698, 1104, 82, 1475, 22, 619, 1144]
[1411, 966, 530, 129, 81, 561, 1779, 1179, 301, 607, 2456]
[1143, 409, 903, 27, 855, 354, 887, 1898, 1880, 301, 1343]
[ 260, 643, 96, 323, 142, 74, 401, 977, 3685, 2690, 709]
[1350, 979, 377, 765, 137, 1295, 615, 592, 2099, 1088, 703]
[2411, 958, 330, 1433, 1355, 680, 1075, 41, 988, 81, 648]
That looks random. Let's check there's no bias:
check partition4 11 10000 1000
92.6425
93.4513
92.3544
90.8508
88.0297
91.7731
88.7939
86.5268
86.3502
95.2499
93.9774
At last!