1

I am trying to figure out how to sample for two random variables uniformly in the region where the sum of the two is greater than zero. I thought a solution might be to sample for X~U(-1,1) and then sample for Y~U(-x,1) where x would be the current sample for X.

But this resulted in a distribution that looks like this.

This doesn't look uniformly distributed as the density of points at the top left is higher and keeps reducing as we move to the right. Can someone point out where the flaw in my reasoning is and how to possibly fix this?

Thank you

krishna
  • 135
  • 1
  • 7
  • From your figure, it appears that in the region of interest, x=+0.75 is much more "likely" than x=-0.75. Hence, your "x" is not uniformly distributed. So you should not sample x thru an uniform distribution. A general way for sampling the proper distribution for x is the [Inverse Method](https://en.wikipedia.org/wiki/Inverse_transform_sampling). But I am not sure it is a good thing here to break the symmetry between x and y in your problem. Generally, you tend to *use* the symmetries in the problem. – jpmarinier Jul 13 '21 at 08:27

3 Answers3

3

You just need to make sure that adjust the density of x points away from the "top-left" corner appropriately. I'd also suggest generating in [0,1] and then transforming into [-1,1] afterwards.

For example:

import numpy as np

# generate points, sqrt takes care of moving points away from zero
n = 50000
x = np.sqrt(np.random.uniform(size=n))
y = np.random.uniform(1-x)

# transform to -1,1
x = x * 2 - 1
y = y * 2 - 1

plotting these gives:

matplotlib output

which looks reasonable to me. Note I've colored the [-1,1] square to show where it should fit.

Sam Mason
  • 15,216
  • 1
  • 41
  • 60
  • Thank you for the answer! I am trying to treat this as an exercise to understand sampling better. Could you please elaborate a bit on how you arrived at the answer? – krishna Jul 13 '21 at 21:30
  • @krishna working in [0,1] makes more math functions naturally do the "right thing". for the use of square root notice that half way on the x-axis there is half as much space on the y-axis. I've just found http://www.cs.cmu.edu/~nasmith/papers/smith+tromble.tr04.pdf which suggests I might not be doing the right thing, but the above plot looks convincing to me and certainly better than what you had – Sam Mason Jul 14 '21 at 09:50
2
Could you please elaborate a bit on how you arrived at the answer?

Well, the main problem consists in getting a fair way to sample the non-uniform distribution of coordinate X.

From elementary geometry, the area of the part of the upper triangle with x < x0 is: (1/2) * (x0 + 1)2. As the total area of this upper triangle is equal to 2, it follows that the cumulative probability P of (X < x0) within the upper triangle is: P = (1/4) * (x0 + 1)2.

So, inverting the last formula, we have: x0 = 2*sqrt(P) - 1

Now, from the Inverse Transform Sampling theorem, we know that we can generate a fair sampling of X by reinterpreting P as a random variable U0 uniformly distributed between 0 and 1.

In Python, this gives us:

    u0 = random.uniform(0.0, 1.0)
    x = (2*math.sqrt(u0)) - 1.0

or equivalently:

    u0 = random.random()
    x  = (2 * math.sqrt(u0)) - 1.0

Note that this is essentially the same maths as in the excellent answer by @SamMason. That thing comes from a general statistical principle. It can just as well be used to prove that a fair sampling of the latitude on a 3D sphere is given by arcsin(2*u - 1).

So now we have x, but we still need y. The underlying 2D density is an uniform one, so for a given x, all possible values of y are equidistributed.

The interval of possible values for y is [-x, 1]. So if U1 is yet another independent random variable uniformly distributed between 0 and 1, y can be drawn from the equation:

y = (1+x) * u1 - x

which in Python is rendered by:

    u1 = random.random()
    y  = (1+x)*u1 - x

Overall, the Python code can be written like this:

import  math
import  random
import  matplotlib.pyplot  as  plt

def mySampler():
    u0 = random.random()
    u1 = random.random()
    x  = 2*math.sqrt(u0) - 1.0
    y  = (1+x)*u1 - x
    return (x,y)

#--- Main program:

points = (mySampler()  for _ in range(10000))  # an iterator object

xx, yy = zip(*points)

plt.scatter(xx, yy, s=0.2)
plt.show()

Graphically, the result looks good enough: Result of Python code

Side note: a cheaper, ad hoc solution:

There is always the possibility of sampling uniformly in the whole square, and rejecting the points whose x+y sum happens to be negative. But this is a bit wasteful. We can have a more elegant solution by noting that the “bad” region has the same shape and area as the “good” region.

So if we get a “bad” point, instead of just rejecting it, we can replace it by its symmetic point with respect to the x+y=0 dividing line. This can be done using the following Python code:

def mySampler2():
    x0 = random.uniform(-1.0, 1.0)
    y0 = random.uniform(-1.0, 1.0)
    s  = x0+y0
    if (s >= 0):
      return (x0, y0)       # good point
    else:
      return (x0-s, y0-s)   # symmetric of bad point

This works fine too. And this is probably the cheapest possible solution regarding CPU time, as we reject nothing, and we don't need to compute a square root.

jpmarinier
  • 4,427
  • 1
  • 10
  • 23
  • 1
    much better than my hand-wavy explanation! also, mirroring the rejected samples is neat, I've always hard rejected variates before. using `sqrt` might be faster due to the lack of an unpredictable branch, but then again you could rearrange the rejection sampler to be more branch predictor friendly – Sam Mason Jul 15 '21 at 08:10
1

Following Generate random locations within a triangular domain

Code, to sample uniformly in any triangle, Python 3.9.4, Win 10 x64

import math
import random

import matplotlib.pyplot as plt

def trisample(A, B, C):
    """
    Given three vertices A, B, C,
    sample point uniformly in the triangle
    """
    r1 = random.random()
    r2 = random.random()

    s1 = math.sqrt(r1)

    x = A[0] * (1.0 - s1) + B[0] * (1.0 - r2) * s1 + C[0] * r2 * s1
    y = A[1] * (1.0 - s1) + B[1] * (1.0 - r2) * s1 + C[1] * r2 * s1

    return (x, y)

random.seed(312345)
A = (1, 0)
B = (1, 1)
C = (0, 1)
points = [trisample(A, B, C) for _ in range(10000)]

xx, yy = zip(*points)
plt.scatter(xx, yy, s=0.2)
plt.show()

enter image description here

Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64