22

Sampling uniformly at random from an n-dimensional unit simplex is the fancy way to say that you want n random numbers such that

  • they are all non-negative,
  • they sum to one, and
  • every possible vector of n non-negative numbers that sum to one are equally likely.

In the n=2 case you want to sample uniformly from the segment of the line x+y=1 (ie, y=1-x) that is in the positive quadrant. In the n=3 case you're sampling from the triangle-shaped part of the plane x+y+z=1 that is in the positive octant of R3:

(Image from http://en.wikipedia.org/wiki/Simplex.)

Note that picking n uniform random numbers and then normalizing them so they sum to one does not work. You end up with a bias towards less extreme numbers.

Similarly, picking n-1 uniform random numbers and then taking the nth to be one minus the sum of them also introduces bias.

Wikipedia gives two algorithms to do this correctly: http://en.wikipedia.org/wiki/Simplex#Random_sampling (Though the second one currently claims to only be correct in practice, not in theory. I'm hoping to clean that up or clarify it when I understand this better. I initially stuck in a "WARNING: such-and-such paper claims the following is wrong" on that Wikipedia page and someone else turned it into the "works only in practice" caveat.)

Finally, the question: What do you consider the best implementation of simplex sampling in Mathematica (preferably with empirical confirmation that it's correct)?

Related questions

Community
  • 1
  • 1
dreeves
  • 26,430
  • 45
  • 154
  • 229
  • It seems there are several methods that work fine - the only real differentiation is in speed and read-ability. What are your criterion other than 'best'? – zdav Jun 10 '10 at 18:08
  • Speed and readability are great criteria! Conciseness could be another. If you have an implementation that has anything at all going for it, go ahead and post it as an answer. – dreeves Jun 10 '10 at 22:35
  • 2
    I think the Wikipedia warning is a little bogus; the authors of the paper cited are worrying about perfect uniformity for a *discretized* version of this problem. The 2nd algorithm described is perfectly correct from a mathematical point of view, and should work well in practice if you're prepared to regard 'random floating-point number from [0, 1]' as a good-enough approximation to 'random real number from [0, 1]'. – Mark Dickinson Jun 11 '10 at 10:17
  • 2
    the link to sampling is dead – Bogdan Lataianu Feb 21 '13 at 17:39
  • [Related](http://math.stackexchange.com/q/32618/2380). – Wok Apr 30 '13 at 11:48
  • Have you some intuitions about how to generalize the problem solution to the non-unit simplicies and maybe even convex polytopes? – Tomilov Anatoliy Jul 17 '14 at 08:49
  • There is [different approach][1], which requires only n-1 uniformely distributed values. [1]: http://math.stackexchange.com/a/871220/54348 – Tomilov Anatoliy Jul 18 '14 at 20:22

6 Answers6

13

This code can work:

samples[n_] := Differences[Join[{0}, Sort[RandomReal[Range[0, 1], n - 1]], {1}]]

Basically you just choose n - 1 places on the interval [0,1] to split it up then take the size of each of the pieces using Differences.

A quick run of Timing on this shows that it's a little faster than Janus's first answer.

Sophie Alpert
  • 139,698
  • 36
  • 220
  • 238
  • Thanks! I think this is pretty much isomorphic to the one I posted. Thanks for the speed comparison, btw! – dreeves Jun 20 '10 at 22:23
  • Oh, indeed it is! Somehow I thought that yours was different, but it looks the same now that I've taken another look. – Sophie Alpert Jun 21 '10 at 00:15
  • Are there generalizations on non-unit simplexes or even (simplicial) convex polytopes? – Tomilov Anatoliy Jul 17 '14 at 08:27
  • @Orient Can't you just scale up the result to whatever size you want? – Sophie Alpert Jul 17 '14 at 08:38
  • @BenAlpert I can simply scale the result up to, for the one hand, triangulation of desired convex polytope in accordance with hypervolumes of the produced simplices, but, on the other hand, I cannot simply deform the unit simplex into arbitrary simplex by means of only the linear transformations and hold the uniformity of spatial distributions by doing so. – Tomilov Anatoliy Jul 17 '14 at 08:43
  • I found [the solution](http://math.stackexchange.com/a/871220/54348) of general problem. – Tomilov Anatoliy Jul 19 '14 at 07:31
  • This is the method I came up with on my own, then I went online to search for something more efficient. I found a great one on the Mathematica page, posted below. – Jerry Guern Aug 18 '21 at 16:18
9

After a little digging around, I found this page which gives a nice implementation of the Dirichlet Distribution. From there it seems like it would be pretty simple to follow Wikipedia's method 1. This seems like the best way to do it.

As a test:

In[14]:= RandomReal[DirichletDistribution[{1,1}],WorkingPrecision->25]
Out[14]= {0.8428995243540368880268079,0.1571004756459631119731921}
In[15]:= Total[%]
Out[15]= 1.000000000000000000000000

A plot of 100 samples:

alt text http://www.public.iastate.edu/~zdavkeos/simplex-sample.png

brainjam
  • 18,863
  • 8
  • 57
  • 82
zdav
  • 2,752
  • 17
  • 15
7

I'm with zdav: the Dirichlet distribution seems to be the easiest way ahead, and the algorithm for sampling the Dirichlet distribution which zdav refers to is also presented on the Wikipedia page on the Dirichlet distribution.

Implementationwise, it is a bit of an overhead to do the full Dirichlet distribution first, as all you really need is n random Gamma[1,1] samples. Compare below
Simple implementation

SimplexSample[n_, opts:OptionsPattern[RandomReal]] :=
  (#/Total[#])& @ RandomReal[GammaDistribution[1,1],n,opts]

Full Dirichlet implementation

DirichletDistribution/:Random`DistributionVector[
 DirichletDistribution[alpha_?(VectorQ[#,Positive]&)],n_Integer,prec_?Positive]:=
    Block[{gammas}, gammas = 
        Map[RandomReal[GammaDistribution[#,1],n,WorkingPrecision->prec]&,alpha];
      Transpose[gammas]/Total[gammas]]

SimplexSample2[n_, opts:OptionsPattern[RandomReal]] := 
  (#/Total[#])& @ RandomReal[DirichletDistribution[ConstantArray[1,{n}]],opts]

Timing

Timing[Table[SimplexSample[10,WorkingPrecision-> 20],{10000}];]
Timing[Table[SimplexSample2[10,WorkingPrecision-> 20],{10000}];]
Out[159]= {1.30249,Null}
Out[160]= {3.52216,Null}

So the full Dirichlet is a factor of 3 slower. If you need m>1 samplepoints at a time, you could probably win further by doing (#/Total[#]&)/@RandomReal[GammaDistribution[1,1],{m,n}].

dreeves
  • 26,430
  • 45
  • 154
  • 229
Janus
  • 5,421
  • 2
  • 26
  • 37
7

Here's a nice concise implementation of the second algorithm from Wikipedia:

SimplexSample[n_] := Rest@# - Most@# &[Sort@Join[{0,1}, RandomReal[{0,1}, n-1]]]

That's adapted from here: http://www.mofeel.net/1164-comp-soft-sys-math-mathematica/14968.aspx (Originally it had Union instead of Sort@Join -- the latter is slightly faster.)

(See comments for some evidence that this is correct!)

dreeves
  • 26,430
  • 45
  • 154
  • 229
  • I just ran a test, and it seems to work ok - the values are pretty uniform and all on the right line. I'm not sure why it was downvoted. – zdav Jun 10 '10 at 18:06
  • The downvote was mine, and was accidental; I apologise. If you can make a trivial edit to your answer, I'll upvote. This method looks correct to me. – Mark Dickinson Jun 11 '10 at 10:12
  • I was just going to post this same method. On my machine it is eight times faster than the method sampling Gamma[1,1]. – Timo Jun 11 '10 at 18:18
  • 2
    Instead of doing `Rest - Most`, you can also just call the built-in `Differences` and that's the same thing. – Sophie Alpert Jun 21 '10 at 00:16
3

I have created an algorithm for uniform random generation over a simplex. You can find the details in the paper in the following link: http://www.tandfonline.com/doi/abs/10.1080/03610918.2010.551012#.U5q7inJdVNY

Briefly speaking, you can use following recursion formulas to find the random points over the n-dimensional simplex:

x1=1-R11/n-1

xk=(1-Σi=1kxi)(1-Rk1/n-k), k=2, ..., n-1

xn=1-Σi=1n-1xi

Where R_i's are random number between 0 and 1.

Now I am trying to make an algorithm to generate random uniform samples from constrained simplex.that is intersection between a simplex and a convex body.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
0

Old question, and I'm late to the party, but this method is much faster than the accepted answer if implemented efficiently.

In Mathematica code: #/Total[#,{2}]&@Log@RandomReal[{0,1},{n,d}]

In plain English, you generate n rows * d columns of randoms uniformly distributed between 0 and 1. Then take the Log of everything. Then normalize each row, dividing each element in the row by the row total. Now you have n samples uniformly distributed over the (d-1) dimensional simplex.

If found this method here: https://mathematica.stackexchange.com/questions/33652/uniformly-distributed-n-dimensional-probability-vectors-over-a-simplex

I'll admit, I'm not sure why it works, but it passes every statistical test I can think of. If anyone has a proof of why this method works, I'd love to see it!