Sampling of positive weights with sum unity and an equality constraint

Question

Suppose I have a vector of positive weights a=(a1, a2, a3, a4) such that a2=a3 and a1+a2+a3+a4=1. Is there any way to sample this kind of weights using R? I tried to think about using the Dirichlet distribution, but it gives no mechanism to force two of the variates to be equal.

josliber · Accepted Answer · 2015-09-16T14:33:26.310

To sample evenly across the set {(a1, a2, a3, a4 | a2=a3, a1+a2+a3+a4=1, a1>0, a2>0, a3>0, a4>0}, I would first sample the value for a2 (which is equal to a3). To do this we need to know the distribution of this value. If a2 = a3 = r, then we have a1+a4 = 1-2r; for positive a1 and a4 there is a line segment of length (1-2k)*sqrt(2) containing all feasible values of a1 and a4. Integrating, the probability that a2 is k or less is 4(k - k^2). In more detail:

Prob (a2 <= k) = Integral(0 to k) (1-2r)*sqrt(2) dr / Integral(0 to 0.5) (1-2r)*sqrt(2) dr
               = ((k-k^2)*sqrt(2)) / (sqrt(2)/4)
               = 4k - 4k^2

Thus, we can sample values for a2 by selecting a uniformly distributed value u~U(0, 1) and setting a2 to equal the value k for which 4k - 4k^2 = u. Solving via the quadratic formula, this yields:

a2 = 0.5 * (1 - sqrt(1-u))

In R, we can sample 1000 values for a2 with:

set.seed(144)
a2 <- 0.5 * (1 - sqrt(1 - runif(1000)))
a3 <- a2

Given a fixed value a2 = a3 = k, the value of a1 is uniformly distributed in [0, 1-2k]:

a1 <- runif(1000) * (1 - 2*a2)

Having specified a1, a2, and a3, there is only one possible value for a4:

a4 <- 1 - a1 - a2 - a3

We can take a look at some of our sampled values:

head(cbind(a1, a2, a2, a4))
#              a1         a2         a2         a4
# [1,] 0.83455239 0.01251016 0.01251016 0.14042729
# [2,] 0.02744599 0.22932773 0.22932773 0.51389856
# [3,] 0.45835472 0.23860119 0.23860119 0.06444291
# [4,] 0.36843649 0.14679703 0.14679703 0.33796946
# [5,] 0.35109881 0.08702039 0.08702039 0.47486041
# [6,] 0.02916818 0.19942616 0.19942616 0.57197949

Here is the distribution of a1 values (note that by symmetry this is identical to the distribution of the a4 values). Because we select a1 uniformly in the range [0, 1-2*a2], lower values are more common than higher values:

Here is the distribution of a2 values (by definition this is the same as the distribution of the a3 values). The shape of the distribution is similar to the one of a1, but the maximum value is 0.5:

You have assumed pdf of `a2` to be proportional to the length of the line segment containing all possible values for `a1` and `a4` i.e `(1-2a_2)sqrt(2)`. Is that correct ? As the length of the line segment increases the value of a_2 should decrease. — Dey, Sep 18 '15 at 05:02

score 1 · Answer 2 · edited May 23 '17 at 12:29

1

I tried to think about using the Dirichlet distribution,

Well, to me it looks like Dirichlet distribution.

but it gives no mechanism to force two of the variates to be equal.

but you don't have to. You actually have three variates from Dirichlet distribution - A, B, C, all >= 0, uniformly distributed U(0,1) so that A+B+C=1

After sampling (A, B, C) you just assign

a1 = A;
a2 = B/2.0;
a3 = B/2.0;
a4 = C;

Please take a look at how to sample thing (well, in Python)

Generating N uniform random numbers that sum to M

edited May 23 '17 at 12:29

Community

1
1

answered Sep 17 '15 at 18:00

Severin Pappadeux

18,636
3
38
64

Thanks @SeverinPappadeux. It is a good observation. I have a doubt though which I asked you in StackExchange as it was not a code related question. – Dey Sep 18 '15 at 04:54
Could you edit your answer to argue why this evenly samples the space `{(a1, a2, a3, a4 | a2=a3, a1+a2+a3+a4=1, a1>0, a2>0, a3>0, a4>0}`? I've computed random variates from my answer and from yours and they have different distributions (e.g. the correlations between pairs of variables differ). – josliber Sep 18 '15 at 05:41
@josilber `I've computed random variates from my answer and from yours and they have different distributions`. They might be. I believe we have a case of n=3 Dirichlet distribution. Dirichlet distribution has properties of x_i within[0...1] and Sum(x_i)=1, but also parametrised by \alpha_i. Simplest case, which is sampled in my python code is where all \alpha_i=1. I'm too lazy to check, but I wouldn't be surprised if you build Dirichlet distribution with different \alpha_i (\alpha_i = 0.5?). In such case your solution is a satisfactory answer (as well as mine) – Severin Pappadeux Sep 18 '15 at 16:39
`I believe we have a case of n=3 Dirichlet distribution.` Could you say why you believe that? I don't know if the set is a Dirichlet distribution (it's not obvious to me why it would be), but I'm pretty sure it is not a Dirichlet distribution with alpha_i=1 for all variables as you have currently suggested in your answer. Could you provide some mathematical reasoning for why you think it is Dirichlet distributed? – josliber Sep 18 '15 at 16:44

Sampling of positive weights with sum unity and an equality constraint

2 Answers2