4

I have 3 variables: X,Y,Z. I want to generate random numbers, using Python, for X,Y,Z where X+Y+Z=1.

However the ranges for each variable is different e.g. X must range between 0.71 and 0.72, Y must range between 0.23 and 0.24, Z must range between 0.05 and 0.06.

The precision of the variables doesn't matter so you can generate the numbers: X= 0.71613464 Y=0.23166405 Z=0.05220131

  • 4
    If you generate X and Y, then Z is fixed. Is there a priority for the variables? – mozway Sep 01 '23 at 11:30
  • 3
    This appears to be underspecified. If X and Y would be uniformly distributed, then `Z=1-X-Y` isn't. – MSalters Sep 01 '23 at 11:45
  • There is no priority for the variables – user1361488 Sep 01 '23 at 11:57
  • 1
    Note that you can simplify this problem by introducing XX=X-0.715, YY=Y-0.235, and ZZ=Z-0.055. Now you have XX+YY+ZZ=-0.005, and XX, YY and ZZ all have range [-0.005, +0.005]. This is a simple axis-aligned transform to make the distributions identical and symmetrical around zero. – MSalters Sep 01 '23 at 12:06
  • See also https://stackoverflow.com/questions/13008383/generate-3-uniform-random-variables-that-sum-to-0. Note that the question there asks about a random point in an equilateral triangle. But the X+Y+Z plane can intersect the cube also near its middle, where the intersection is not a triangle but a hexagon. – MSalters Sep 01 '23 at 12:23
  • If the variables are not prioritized, I'd simply generate each variable within its range, then divide all of them by their sum to normalize them. Though, now I think about it, there might be edge cases where the normalization pushes some vars out of their ranges... – Swifty Sep 01 '23 at 12:35
  • @Swifty: It appears this question isn't that rigorous. For instance, 0.71 is not exactly representable anyway. It already has a rounding error. – MSalters Sep 01 '23 at 12:59

6 Answers6

9

The problem

We want to select 3 coordinates within a specific range. However, once we select X and Y, Z is fixed.

uniform sampling on a triangle

We cannot have full degrees of liberty, for instance if we picked X = 0.72 and Y = 0.24, then Z cannot exist within the desired range.

If we pick X and Y randomly and compute Z, in many cases Z will be incorrect:

size = 10_000

X = np.random.uniform(0.71, 0.72, size=size)
Y = np.random.uniform(0.23, 0.24, size=size)
Z = 1-(X+Y)

incorrect sampling on a triangle

The question is: how to solve this?

oversampling

This is usually not a recommended approach, but we can use trial and error, here we need to sample a bit more than twice to hope having enough valid points (sampling on a square, keeping half of the points):

target = 100_000
factor = 2.2  # the exact value depends on the target
              # there is always a risk that we miss a few points

size = int(target*factor)
X = np.random.uniform(0.71, 0.72, size=size)
Y = np.random.uniform(0.23, 0.24, size=size)
Z = 1-(X+Y)
idx = np.arange(size)[(Z>=0.05) & (Z<=0.06)][:target]
X = X[idx]
Y = Y[idx]
Z = Z[idx]

assert np.allclose(X+Y+Z, 1)

uniform sampling on a triangle by trial and error

using math

What we are looking for is equivalent to generating uniform points on a triangle.

You can use this formula:

uniform sampling on a triangle using a formula

size = 10_000
limits = [(0.71, 0.72), (0.23, 0.24), (0.05, 0.06)]

MIN, MAX = np.array(limits).T
corners = np.repeat(MIN[None], 3, axis=0)
np.fill_diagonal(corners, MAX)

# or setting the corners manually:
# corners = np.array([[0.72, 0.23, 0.05],
#                     [0.71, 0.24, 0.05],
#                     [0.71, 0.23, 0.06]])

r1 = np.sqrt(np.random.random(size=size))
r2 = np.random.random(size=size)

X, Y, Z = (corners[:,None] * np.array([(1-r1), r1*(1-r2), r2*r1])[...,None]
          ).sum(axis=0).T

assert np.allclose(X+Y+Z, 1)

Alternative using einsum for the last step:

X, Y, Z = np.einsum('ik,ij->kj', corners, np.array([(1-r1), r1*(1-r2), r2*r1]))

enter image description here

(The black dots are the corners of the triangle)

Note that for this approach to work the sum of the minimum boundaries of two coordinates + the maximum boundary of the third coordinate must be equal to 1 (Xmax+Ymin+Zmin == Xmin+Ymax+Zmin == Xmin+Ymin+Zmax == 1). If this is not the case, ignore the MAX and use corners = (1-np.identity(3))*MIN ; np.fill_diagonal(corners, 1-corners.sum(axis=1)).

mozway
  • 194,879
  • 13
  • 39
  • 75
1

This is simple way to generate truly random variables.

import random


def find_xyz():
    while True:
        x = random.uniform(0.71, 0.72)
        y = random.uniform(0.23, 0.24)
        z = 1 - x - y

        if 0.05 < z < 0.06:
            return x, y, z
    

if __name__ == "__main__":
    x, y, z = find_xyz()
    print(x, y, z)
milosz-k
  • 11
  • 1
1

You could convert each range to a zero based span of their possible delta from the minimum (range of freedom). You know each minimum value so those can be excluded from the calculation and added back in after randomly selecting an offset from their base (minimum value). This makes the problem a little simpler by having all minimums set to zero and only the x,y,z offsets needing to add up to the remaining range.

xMin,xMax = 0.71, 0.72
yMin,yMax = 0.23, 0.24
zMin,zMax = 0.05, 0.06

baseMin = xMin + yMin + zMin   # 0.99
total = 1 - baseMin            # 0.01
xSpan = xMax-xMin              # 0 ... 0.01
ySpan = yMax-yMin              # 0 ... 0.01
zSpan = zMax-zMin              # 0 ... 0.01

import random

xDelta = random.uniform(max(0,total-xSpan-zSpan), xSpan)
yDelta = random.uniform(max(0,total-xDelta-zSpan), min(ySpan, total-xDelta))
zDelta = total - xDelta - yDelta

X = xMin + xDelta
Y = yMin + yDelta
Z = zMin + zDelta

result:

print(X,Y,Z,"=",X+Y+Z)
print("X is valid: ", xMin <= X <= xMax)
print("Y is valid: ", yMin <= Y <= yMax)
print("Y is valid: ", zMin <= Z <= zMax)

0.7170555223349833 0.23168125318306362 0.05126322448195306 = 1.0
X is valid:  True
Y is valid:  True
Y is valid:  True

The first random value (xDelta) is constrained by the maximum contribution that can be made by the two others towards the total delta. the second random value (yDelta) is constrained by the (now known) first value and the implicit range of the remaining value. The last value has no freedom at that point and must be the remainder of the total delta. This last value will be within it's specified span because of constraints applied to the previous two.

Note that your example has the same (fully overlapping) range of freedom on all 3 values (0 ... 0.01) but this solution should also work if the ranges are of different size and some of them don't fully overlap the total freedom range

Unfortunately, this approach des not give a regular distribution (as measured by mozway).

In the special case where all 3 ranges have the same interval size and correspond to the difference with the total minimums (i.e. full overlap in your example 0.01), I was able to find a way to produce a regular distribution by randomly picking two points in the 0 ... 0.01 interval and using the 3 parts of the split as the delta values:

def randXYZ2(xRange,yRange,zRange):
        
    xMin,xMax = xRange
    yMin,yMax = yRange
    zMin,zMax = zRange

    baseMin = xMin + yMin + zMin   # 0.99
    total = 1 - baseMin            # 0.01

    p0 = random.uniform(0,total)
    p1 = random.uniform(0,total)
    if p0>p1: p0,p1 = p1,p0
    
    xDelta = p0       
    yDelta = p1-p0
    zDelta = total - xDelta - yDelta

    X = xMin + xDelta
    Y = yMin + yDelta
    Z = zMin + zDelta

    return X,Y,Z

output:

size = 3000
xRange = 0.71, 0.72
yRange = 0.23, 0.24
zRange = 0.05, 0.06
samples = [ randXYZ2(xRange,yRange,zRange) for _ in range(size) ]
xs,ys,zs = zip(*samples)

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(xs,ys,zs,  marker="o", s=1)
plt.show()

enter image description here

Alain T.
  • 40,517
  • 4
  • 31
  • 51
  • xDelta is uniform over its full range but yDelta isn't... that doesn't seem fair. – Kelly Bundy Sep 01 '23 at 14:47
  • I also though about that but I'm not sure it is actually biased given that there is a mutual constraint between the values which inevitably impacts the uniformity of others based on any specific value for one of them. At least, the constrained range of the second value is chosen uniformly within its remaining degree of freedom. (the maths for this are beyond my level) – Alain T. Sep 01 '23 at 14:52
  • Someone with the time and inclination could perhaps make a statistical comparison of this with the less efficient (but more obviously fair) trial and error approach. – Alain T. Sep 01 '23 at 14:56
  • 2
    It's working but is [not uniform](https://i.stack.imgur.com/cLU45.png) – mozway Sep 01 '23 at 20:13
0

Could np.random.dirichlet work?

X, Y, Z = 0.01 * np.random.dirichlet([1, 1, 1]) + [0.71, 0.23, 0.05]
Chrysophylaxs
  • 5,818
  • 3
  • 10
  • 21
-1
import random

X, Y, Z = 0.0, 0.0, 0.0

epsilon = 0.01

target = 1

while True:
    
    X = random.uniform(0.71, 0.72)
    Y = random.uniform(0.23, 0.24)
    Z = random.uniform(0.05, 0.06)
    
    if target - epsilon <= (X + Y + Z) <= target + epsilon:
        print(X, Y, Z)

upd:

import random

epsilon = 1000

x_d, x_u = 71 * epsilon, 72 * epsilon
y_d, y_u = 23 * epsilon, 24 * epsilon
z_d, z_u = 5 * epsilon, 6 * epsilon


target = 1

while True:
    
    X = random.randint(x_d, x_u)
    Y = random.randint(y_d, y_u)
    Z = random.randint(z_d, z_u)
    
    if (X + Y + Z) == target*epsilon:
        print(X / epsilon, Y / epsilon, Z / epsilon)

Fedor
  • 19
  • 4
  • 1
    That will be highly inefficient if you want to sample a large population. – mozway Sep 01 '23 at 11:49
  • @mozway: It depends: what's the rejected proportion ? In this case, you're looking at the plane `X+Y+Z=1` intersecting an axis-aligned cube of size 0.01. Since the epsilon is huge (0.01), this rejects almost nothing. But because of this high epsilon, the code is almost equal to ignoring the epsilon altogether. Note that the smallest possible triplet 0.71+0.23+0.05=0.99 is still within epsilon. You require less than two attempts on average. – MSalters Sep 01 '23 at 11:54
  • However, if you transform this to a UV plane orthogonal to (1,1,1) and uniformly pick from a square surrounding the cube, then reject points outside the cube, then you need about 4 attempts and zero epsilon. Since your points are uniform in UV and the transform is linear, your joint distribution is still uniform. – MSalters Sep 01 '23 at 12:01
  • 1
    @MSalters I was commenting on the general approach, generating random values by trial and error is almost never the good approach – mozway Sep 01 '23 at 12:10
  • @mozway: It's a decent enough strategy when you reject <50% of your tries. E.g. for a point in the unit circle, drawing uniformly from the surrounding square will reject (1-pi/4), the areas near the corners. That means you need about 1.3 attempt on average. And the reject test is just a cheap `x*x+y*y>=1` – MSalters Sep 01 '23 at 12:15
  • @MSalters But this solution doesn't meet the requirement of numbers summing to 1. – matszwecja Sep 01 '23 at 12:22
  • Using float numbers you never can't reach eps = 0.0, so for `eps = 1e-6` I iterate over 1e+8 values and find 1e+4 random points and for a lot of tasks it would be okay. btw it takes about `1 min. – Fedor Sep 01 '23 at 12:33
  • @MSalters what do you think about my updated solution? – Fedor Sep 01 '23 at 13:01
  • @mozway what do you think about my updated solution? – Fedor Sep 01 '23 at 13:02
  • @Fedor: The integer solution has a weirdly-named `epsilon`. But now the problem surfaces which mozway correctly identified: You're rejecting 99.9999% of all triplets. – MSalters Sep 01 '23 at 13:14
-1

Here is the program -

import random
X = random.uniform(0.71, 0.72)
Y = random.uniform(0.23, 0.24)
Z = 1 - X - Y

# Adjust Z if it is outside the range [0.05, 0.06]
if Z < 0.05:
    Z = 0.05
elif Z > 0.06:
    Z = 0.06

# Print the values of X, Y, and Z
print(f"X = {X:.8f}, Y = {Y:.8f}, Z = {Z:.8f}")

Each time, the values of all the 3 variables change, but add up to 1.

Hope this helps.

Nob0Dy
  • 54
  • 6
  • 3
    You forgot to adjust X or Y when adjusting Z. Note that this implementation will affect the distribution. In particular, Z=0.05 and Z=0.06 will be far more likely than any other Z value. – MSalters Sep 01 '23 at 11:47
  • The distribution is actually fine for the valid points, but indeed [half of the points are invalid](https://i.stack.imgur.com/eYl4O.png) – mozway Sep 01 '23 at 13:58