Suppose I have a vector of positive weights a=(a1, a2, a3, a4)
such that a2=a3
and a1+a2+a3+a4=1
. Is there any way to sample this kind of weights using R? I tried to think about using the Dirichlet distribution, but it gives no mechanism to force two of the variates to be equal.
2 Answers
To sample evenly across the set {(a1, a2, a3, a4 | a2=a3, a1+a2+a3+a4=1, a1>0, a2>0, a3>0, a4>0}
, I would first sample the value for a2
(which is equal to a3
). To do this we need to know the distribution of this value. If a2 = a3 = r
, then we have a1+a4 = 1-2r
; for positive a1 and a4 there is a line segment of length (1-2k)*sqrt(2)
containing all feasible values of a1
and a4
. Integrating, the probability that a2
is k
or less is 4(k - k^2)
. In more detail:
Prob (a2 <= k) = Integral(0 to k) (1-2r)*sqrt(2) dr / Integral(0 to 0.5) (1-2r)*sqrt(2) dr
= ((k-k^2)*sqrt(2)) / (sqrt(2)/4)
= 4k - 4k^2
Thus, we can sample values for a2
by selecting a uniformly distributed value u~U(0, 1)
and setting a2
to equal the value k
for which 4k - 4k^2 = u
. Solving via the quadratic formula, this yields:
a2 = 0.5 * (1 - sqrt(1-u))
In R, we can sample 1000 values for a2
with:
set.seed(144)
a2 <- 0.5 * (1 - sqrt(1 - runif(1000)))
a3 <- a2
Given a fixed value a2 = a3 = k
, the value of a1
is uniformly distributed in [0, 1-2k]
:
a1 <- runif(1000) * (1 - 2*a2)
Having specified a1
, a2
, and a3
, there is only one possible value for a4
:
a4 <- 1 - a1 - a2 - a3
We can take a look at some of our sampled values:
head(cbind(a1, a2, a2, a4))
# a1 a2 a2 a4
# [1,] 0.83455239 0.01251016 0.01251016 0.14042729
# [2,] 0.02744599 0.22932773 0.22932773 0.51389856
# [3,] 0.45835472 0.23860119 0.23860119 0.06444291
# [4,] 0.36843649 0.14679703 0.14679703 0.33796946
# [5,] 0.35109881 0.08702039 0.08702039 0.47486041
# [6,] 0.02916818 0.19942616 0.19942616 0.57197949
Here is the distribution of a1
values (note that by symmetry this is identical to the distribution of the a4
values). Because we select a1
uniformly in the range [0, 1-2*a2]
, lower values are more common than higher values:
Here is the distribution of a2
values (by definition this is the same as the distribution of the a3
values). The shape of the distribution is similar to the one of a1
, but the maximum value is 0.5:

- 43,891
- 12
- 98
- 133
-
You have assumed pdf of `a2` to be proportional to the length of the line segment containing all possible values for `a1` and `a4` i.e `(1-2a_2)sqrt(2)`. Is that correct ? As the length of the line segment increases the value of a_2 should decrease. – Dey Sep 18 '15 at 05:02
I tried to think about using the Dirichlet distribution,
Well, to me it looks like Dirichlet distribution.
but it gives no mechanism to force two of the variates to be equal.
but you don't have to. You actually have three variates from Dirichlet distribution - A, B, C, all >= 0, uniformly distributed U(0,1) so that A+B+C=1
After sampling (A, B, C) you just assign
a1 = A;
a2 = B/2.0;
a3 = B/2.0;
a4 = C;
Please take a look at how to sample thing (well, in Python)

- 1
- 1

- 18,636
- 3
- 38
- 64
-
Thanks @SeverinPappadeux. It is a good observation. I have a doubt though which I asked you in StackExchange as it was not a code related question. – Dey Sep 18 '15 at 04:54
-
Could you edit your answer to argue why this evenly samples the space `{(a1, a2, a3, a4 | a2=a3, a1+a2+a3+a4=1, a1>0, a2>0, a3>0, a4>0}`? I've computed random variates from my answer and from yours and they have different distributions (e.g. the correlations between pairs of variables differ). – josliber Sep 18 '15 at 05:41
-
@josilber `I've computed random variates from my answer and from yours and they have different distributions`. They might be. I believe we have a case of n=3 Dirichlet distribution. Dirichlet distribution has properties of x_i within[0...1] and Sum(x_i)=1, but also parametrised by \alpha_i. Simplest case, which is sampled in my python code is where all \alpha_i=1. I'm too lazy to check, but I wouldn't be surprised if you build Dirichlet distribution with different \alpha_i (\alpha_i = 0.5?). In such case your solution is a satisfactory answer (as well as mine) – Severin Pappadeux Sep 18 '15 at 16:39
-
`I believe we have a case of n=3 Dirichlet distribution.` Could you say why you believe that? I don't know if the set is a Dirichlet distribution (it's not obvious to me why it would be), but I'm pretty sure it is not a Dirichlet distribution with alpha_i=1 for all variables as you have currently suggested in your answer. Could you provide some mathematical reasoning for why you think it is Dirichlet distributed? – josliber Sep 18 '15 at 16:44