13

I am hoping to create 3 (non-negative) quasi-random numbers that sum to one, and repeat over and over.

Basically I am trying to partition something into three random parts over many trials.

While I am aware of

a = runif(3,0,1)

I was thinking that I could use 1-a as the max in the next runif, but it seems messy.

But these of course don't sum to one. Any thoughts, oh wise stackoverflow-ers?

Mr weasel
  • 113
  • 1
  • 8
mmann1123
  • 5,031
  • 7
  • 41
  • 49

7 Answers7

16

This question involves subtler issues than might be at first apparent. After looking at the following, you may want to think carefully about the process that you are using these numbers to represent:

## My initial idea (and commenter Anders Gustafsson's):
## Sample 3 random numbers from [0,1], sum them, and normalize
jobFun <- function(n) {
    m <- matrix(runif(3*n,0,1), ncol=3)
    m<- sweep(m, 1, rowSums(m), FUN="/")
    m
}

## Andrie's solution. Sample 1 number from [0,1], then break upper 
## interval in two. (aka "Broken stick" distribution).
andFun <- function(n){
  x1 <- runif(n)
  x2 <- runif(n)*(1-x1)
  matrix(c(x1, x2, 1-(x1+x2)), ncol=3)
}

## ddzialak's solution (vectorized by me)
ddzFun <- function(n) {
    a <- runif(n, 0, 1)
    b <- runif(n, 0, 1)
    rand1 = pmin(a, b)
    rand2 = abs(a - b)
    rand3 = 1 - pmax(a, b)
    cbind(rand1, rand2, rand3)
}
    
## Simulate 10k triplets using each of the functions above
JOB <- jobFun(10000)
AND <- andFun(10000)
DDZ <- ddzFun(10000)

## Plot the distributions of values
par(mfcol=c(2,2))
hist(JOB, main="JOB")
hist(AND, main="AND")
hist(DDZ, main="DDZ")

enter image description here

Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • Nice, I was thinking about plotting the results but you already did this. It's interesting to see that apparently none of the solutions really does what one would have liked intuitively. It's also interesting that in these plots you can't really see that DDZ does the right thing according to the means while AND does not even that. – Christian Jun 12 '12 at 22:37
  • Stefan Jelkovich -- Would you please revise the plots in your suggested edit so that their x-axes run from 0 to 1, and also indicate in the body of the post which part is an addendum by you? Then resubmit, and I'll accept the edits. (You'll be able to recover the edits you already made by clicking the "Edit" button and reviewing the history of changes it reveals.) Thanks. – Josh O'Brien May 12 '22 at 16:05
  • Stefan Jelkovich -- After taking a closer look, I rolled back your edits, as the results you presented don't address the question that this post is answering. (Have a look at the probabilities on your figures' x-axes to see what's going wrong with your solutions.) – Josh O'Brien May 12 '22 at 19:14
11

just random 2 digits from (0, 1) and if assume its a and b then you got:

rand1 = min(a, b)
rand2 = abs(a - b)
rand3 = 1 - max(a, b)
ddzialak
  • 1,042
  • 7
  • 15
  • Additionally, you have to repeat generating second number if a == b ... (should be VERY rare case) – ddzialak Jun 12 '12 at 20:13
  • @user so a=0.85, b=0.99 then you got numbers: 0.85, 0.14, 0.01 (as for me these are very good 3 random numbers from 0..1) – ddzialak Jun 12 '12 at 20:15
  • 3
    The resulting distribution seems to be not exactly trivial: http://www.jstor.org/discover/10.2307/2983572?uid=2129&uid=2&uid=70&uid=4&sid=21100849643501 and a later paper that can be freely accessed http://doc.utwente.nl/70657/1/Sleutel67random.pdf – Christian Jun 12 '12 at 20:45
9

When you want to randomly generate numbers that add to 1 (or some other value) then you should look at the Dirichlet Distribution.

There is an rdirichlet function in the gtools package and running RSiteSearch('Dirichlet') brings up quite a few hits that could easily lead you to tools for doing this (and it is not hard to code by hand either for simple Dirichlet distributions).

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
6

I guess it depends on what distribution you want on the numbers, but here is one way:

diff(c(0, sort(runif(2)), 1))

Use replicate to get as many sets as you want:

> x <- replicate(5, diff(c(0, sort(runif(2)), 1)))
> x
           [,1]       [,2]      [,3]      [,4]       [,5]
[1,] 0.66855903 0.01338052 0.3722026 0.4299087 0.67537181
[2,] 0.32130979 0.69666871 0.2670380 0.3359640 0.25860581
[3,] 0.01013117 0.28995078 0.3607594 0.2341273 0.06602238
> colSums(x)
[1] 1 1 1 1 1
ALiX
  • 1,021
  • 5
  • 9
5

I would simply randomly select 3 numbers from uniform distribution and then divide by their sum:

n <- 3
x <- runif(n, 0, 1)
y <- x / sum(x)
sum(y) == 1

n could be any number you like.

fiona
  • 59
  • 1
  • 2
  • 6
2

This problem and the different solutions proposed intrigued me. I did a little test of the three basic algorithms suggested and what average values they would yield for the numbers generated.

choose_one_and_divide_rest
means:                [ 0.49999212  0.24982403  0.25018384]
standard deviations:  [ 0.28849948  0.22032758  0.22049302]
time needed to fill array of size 1000000 was 26.874945879 seconds

choose_two_points_and_use_intervals
means:                [ 0.33301421  0.33392816  0.33305763]
standard deviations:  [ 0.23565652  0.23579615  0.23554689]
time needed to fill array of size 1000000 was 28.8600130081 seconds

choose_three_and_normalize
means:                [ 0.33334531  0.33336692  0.33328777]
standard deviations:  [ 0.17964206  0.17974085  0.17968462]
time needed to fill array of size 1000000 was 27.4301018715 seconds

The time measurements are to be taken with a grain of salt as they might be more influenced by the Python memory management than by the algorithm itself. I'm too lazy to do it properly with timeit. I did this on 1GHz Atom so that explains why it took so long.

Anyway, choose_one_and_divide_rest is the algorithm suggested by Andrie and the poster of the question him/herself (AND): you choose one value a in [0,1], then one in [a,1] and then you look what you have left. It adds up to one but that's about it, the first division is twice as large as the other two. One might have guessed as much ...

choose_two_points_and_use_intervals is the accepted answer by ddzialak (DDZ). It takes two points in the interval [0,1] and uses the size of the three sub-intervals created by these points as the three numbers. Works like a charm and the means are all 1/3.

choose_three_and_normalize is the solution by Anders Gustafsson and Josh O'Brien (JOB). It just generates three numbers in [0,1] and normalizes them back to a sum of 1. Works just as well and surprisingly a little bit faster in my Python implementation. The variance is a bit lower than for the second solution.

There you have it. No idea to what beta distribution these solutions correspond or which set of parameters in the corresponding paper I referred to in a comment but maybe someone else can figure that out.

Community
  • 1
  • 1
Christian
  • 499
  • 6
  • 24
1

The simplest solution is the Wakefield package probs() function

probs(3) will yield a vector of three values with a sum of 1

given that you can rep(probs(3),x) where x is "over and over"

no drama

Peter King
  • 91
  • 8
  • Great package - thanks for mentioning, @peter-king. Not really on topic, but I wonder if there is a solution to get n random integers summing up to a given integer x. – Stefan Jelkovich May 12 '22 at 06:43