5

I already know how to generate random numbers in a range. I can do this by using

rand.nextInt((max - min) + 1) + min;

The problem is that I would also like to set a standard deviation for these numbers. The numbers also need to be positive and they are not between 0 and 1

EDIT I removed the ThreadLocalRandom class because I cannot set a seed in that class and these random numbers should be reproducible in a different system.

statboy
  • 85
  • 8
  • 2
    Could this help? ... http://stackoverflow.com/questions/31754209/can-random-nextgaussian-sample-values-from-a-distribution-with-different-mean – Tim Biegeleisen Aug 25 '16 at 12:16
  • Hint: it took me less time to find a "duplicated" question than to read your question. I guess: writing your question took even longer. So please; do some prior research next time. – GhostCat Aug 25 '16 at 12:19
  • @GhostCat The "duplicated" answer doesnt solve my problem because I need numbers within a range. – statboy Aug 25 '16 at 12:34
  • 2
    @statboy You must specify what kind of distribution you'd like to have. Your `nextInt` example produces a [discrete uniform distribution](https://en.wikipedia.org/wiki/Discrete_uniform_distribution) whose variance (StdDev) is already determined by the `min` and `max` parameters (see the wikipedia link). – Stefan Zobel Aug 25 '16 at 12:43
  • 1
    With your latest edit, I deleted my answer because it clearly doesn't address the requirements you edited (after the fact) *The numbers also need to be positive and they are not between 0 and 1*. You will need to specify further what you mean by having a "standard deviation" with those restrictions. – Tunaki Aug 25 '16 at 14:44
  • Are you looking for integer or continuous results? – pjs Aug 25 '16 at 14:46
  • @pjs A continuous distribution. He stated that earlier in a comment to an answer that got deleted in the meanwhile. – Stefan Zobel Aug 25 '16 at 14:49
  • @StefanZobel The reason I asked for clarification is that he hasn't said so in the question itself, and his example of what he "can do" is integer. – pjs Aug 25 '16 at 16:28
  • @pjs Sure. Out of curiosity: do you know a distribution that can be bounded where the relation between mean / variance and the bounds is not so "convoluted" as in the truncated normal case? – Stefan Zobel Aug 25 '16 at 16:31
  • @StefanZobel One example would be the [triangular distribution](https://en.wikipedia.org/wiki/Triangular_distribution), where 'a' and 'b' define the range and the mode 'c' can be dialed between those limits to yield different variances, but there are limits on what variances are feasible because of the range limitation and the fact that the mode has to fall within the range. If you look at the bottom of the linked Wikipedia page for "Continuous univariate supported on a bounded interval," there are other alternatives, but again, not just any variance/std-dev is possible. – pjs Aug 25 '16 at 16:37
  • @pjs Yes, that also came to my mind. I think it doesn't get simpler than that. But it's sadly a bit restricted. – Stefan Zobel Aug 25 '16 at 16:40
  • 1
    @StefanZobel A scaled & translated [beta distribution](https://en.wikipedia.org/wiki/Beta_distribution) offers a lot of flexibility. – pjs Aug 25 '16 at 16:43
  • 1
    @pjs Also thought about that:) Do you have a link for the Var formulas in the translated / scaled case? Or is it simply the `Var(aX) = a*a Var(X)`, `Var(a + X) = Var(X)`math? – Stefan Zobel Aug 25 '16 at 16:49
  • 2
    @StefanZobel If OP wants to migrate a Beta to a range between 1 and `c`, scale it by `c-1` and add 1. The variance is affected exactly as you describe -- additive constants don't change it, and you scale the original Beta's variance by `(c-1)^2`, where caret denotes exponentiation. – pjs Aug 25 '16 at 17:17

1 Answers1

6

Choosing the standard deviation (or variance) for a bounded distribution can only be done subject to constraints that depend on the selected distribution and the bounds (min, max) of your interval. Some distributions may allow you to make the variance arbitrarily small (e.g. the Beta distribution), other distributions (like the Uniform distribution) don't allow any flexibility once the bounds (min, max) have been set. In any case, you'll never be able to make the variance arbitrarily large -- the bounds do prevent this (they'll always enter the expression for the distribution's variance).

I'll illustrate this for a very simple example that can be implemented without requiring any 3rd party libraries. Let's assume you want a symmetric distribution on the interval (min, max), symmetry implying that the mean E(X) of the distribution is located in the middle of the interval: E(X) = (min + max)/2.

Using Random's nextDouble as in x = a + (b - a) * rnd.nextDouble() will give you a uniformly distributed random variable in the interval a <= x < b that has a fixed variance Var(X) = (b - a)^2 / 12 (not what we want).

OTH, simulating a symmetric triangular distribution on the same interval (a, b) would give us a random variate whith the same mean but with only half the variance: Var(X) = (b - a)^2 / 24 (also fixed, so also not what we want).

A symmetric trapezoidal distribution with parameters (a < b < c < d) lies somewhere in the middle of a Uniform and a triangular distribution on the interval (a, d). The symmetry condition implies d - c = b - a, in the following I'll refer to the distance b - a as x or as "displacement" (I've made up that name, it's not a technical term).

If you let x approach 0.0 from above, the trapezoidal will begin to look very similar to a uniform distribution and its variance will tend to the maximum possible value (d - a)^2 / 12. If you let x approach the maximum possible value (d - a)/2 from below, the trapezoidal will look very similar to a symmetric triangle distribution and its variance will approach the minimum possible value of (d - a)^2 / 24) (but note that we should stay away a little from these extreme values in order not to break the variance formula or our algorithm for the trapezoidal).

So, the idea is to construct a trapezoidal distribution with a value for x that yields the standard deviation you want, given the condition that your targeted standard deviation must lie inside the open range (roughly) given by (0.2041(d - a), 0.2886(d - a)). For convenience let's assume that a = min = 2.0 and d = max = 10.0 which gives us this range of possible stddevs: (1.6328, 2.3088). Let's further assume that we want to construct a distribution with a stddev of 2.0 (which, of course, has to be in the admissible range).

Solving this requires 3 steps:

1) we need to have a formula for the variance given min, max and an admissible value for the displacement x

2) we need to somehow "invert" this expression to give us the value of x for our target variance

3) once we know the value of x we must construct a random variable that has a symmetric trapezoidal distribution with the parameters (min, max, x)

Step 1:

/**
 * Variance of a symmetric trapezoidal distribution with parameters
 * {@code a < b < c < d} and the length of {@code d - c = b - a}
 * (by symmetry) identified by {@code x}.
 * 
 * @param a support lower bound
 * @param d support upper bound
 * @param x length of {@code d - c = b - a}, constrained to lie in the open
 *          interval {@code (0, (d-a)/2)}
 * @return variance of the symmetric trapezoidal distribution defined by
 *         the triple {@code (a, d, x)}
 */
static double varSymTrapezoid(double a, double d, double x) {
    if (a <= 0.0 || d <= 0.0 || a >= d) {
        throw new IllegalArgumentException();
    }
    if (x <= 0.0 || x >= (d - a) / 2) {
        throw new IllegalArgumentException();
    }
    double b = a + x;
    double c = d - x;
    double b3 = pow(b, 3);
    double c3 = pow(c, 3);
    double ex2p1 = pow(b, 4) / 4 - a * b3 / 3 + pow(a, 4) / 12;
    double ex2p2 = (c3 / 3 - b3 / 3) * (d - c);
    double ex2p3 = pow(c, 4) / 4 - d * c3 / 3 + pow(d, 4) / 12;
    double ex2 = (ex2p1 + ex2p2 + ex2p3) / ((d - b) * (d - c));
    return ex2 - pow((a + d) / 2, 2);
}

Note that this formula is only valid for symmetric trapezoidal distributions. As an example, if you call this method with a displacement of 2.5 (varSymTrapezoid(2.0, 10.0, 2.5)) it'd give you back a variance of approximately 3.0416 which is too low (we need 4.0), meaning that a displacement of 2.5 is too much (higher displacements give lower variances).

The variance expression is a 4th order polynomial in x that I'd rather not want to solve analytically. However, for a target x in the admissible range this expression is monotonically decreasing, so we can construct a function that crosses zero for our target variance and solve this by simple bisection. This is

Step 2:

/**
 * Find the displacement {@code x} for the given {@code stddev} by simple
 * bisection.
 * @param min support lower bound
 * @param max support upper bound
 * @param stddev the standard deviation we want
 * @return the length {@code x} of {@code d - c = b - a} that yields a
 * standard deviation roughly equal to {@code stddev}
 */
static double bisect(double min, double max, double stddev) {
    final double eps = 1e-4;
    final double var = pow(stddev, 2);
    int iters = 0;
    double a = eps;
    double b = (max - min) / 2 - eps;
    double x = eps;
    double dx = b - a;

    while (abs(dx) > eps && iters < 150 && eval(min, max, x, var) != 0.0) {
        x = ((a + b) / 2);
        if ((eval(min, max, a, var) * eval(min, max, x, var)) < 0.0) {
            b = x;
            dx = b - a;
        } else {
            a = x;
            dx = b - a;
        }
        iters++;
    }
    if (abs(eval(min, max, x, var)) > eps) {
        throw new RuntimeException("failed to find solution");
    }
    return x;
}

/**
 * Function whose root we want to find.
 */
static double eval(double min, double max, double x, double var) {
    return varSymTrapezoid(min, max, x) - var;
}

Calling the bisect method with the desired value 2.0 for the standard deviation (bisect(2.0, 10.0, 2.0)) gives us the needed displacement: ~ 1.1716. Now that the value for x is known, the final thing we have to do is to construct a suitably distributed random variable which is

Step 3:

It is a well-known fact of probability theory that the sum of two independent uniformly distributed random variables X1 ~ U[a1, b1] and X2 ~ U[a2, b2] is a symmetric trapezoidally distributed random variable on the interval [a1 + a2, b1 + b2] provided that either a1 + b2 < a2 + b1 (case 1) or a2 + b1 < a1 + b2 (case 2). We have to avoid the case a2 + b1 = a1 + b2 (case 3) since then the sum has a symmetric triangular distribution which we don't want.

We'll choose case 1 (a1 + b2 < a2 + b1). In that case the length of b2 - a2 will be equal to the "displacement" x.

So, all we have to do is to choose the interval boundaries a1, a2, b1 and b2 such that a1 + a2 = min, b1 + b2 = max, b2 - a2 = x and the above inequality is fullfilled:

/**
 * Return a pseudorandom double for the symmetric trapezoidal distribution
 * defined by the triple {@code (min, max, x)}
 * @param min support lower bound
 * @param max support upper bound
 * @param x length of {@code max - c = b - min}, constrained to lie in the
 *          open interval {@code (0, (max-min)/2)}
 */
public static double symTrapezoidRandom(double min, double max, double x) {
    final double a1 = 0.5 * min;
    final double a2 = a1;

    final double b1 = max - a2 - x;
    final double b2 = a2 + x;

    if ((a1 + b2) >= (a2 + b1)) {
        throw new IllegalArgumentException();
    }

    double u = a1 + (b1 - a1) * rnd.nextDouble();
    double v = a2 + (b2 - a2) * rnd.nextDouble();
    return u + v;
}

Calling symTrapezoidRandom(2.0, 10.0, 1.1716) repeatedly gives you random variables that have the desired distribution.

You could do very similar things with other, more sophisticated, distributions like the Beta. This would give you other (usually more flexible) bounds on the admissible variances but you'd need a 3rd party library like commons.math for that.

abs, pow, sqrt in the code refer to the statically imported java.lang.Math methods and rnd is an instance of java.util.Random.

Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
  • Could you please tell me where you did get the variance formula for the symmetric trapezoidal from? I couldn't find it anywhere. – apophis Sep 01 '16 at 21:47
  • @apophis I computed it manually using pen and paper. The funny thing is that the formula given [here](http://doc.openturns.org/openturns-latest/sphinx/user_manual/_generated/openturns.Trapezoidal.html) is wrong although it looks reasonable. Maybe they've only swapped the variables. [This](https://de.wikipedia.org/wiki/Trapezverteilung) one is even worse in that not even `E(X)` is correct. I could not find one correct equation that uses the `(a, b, c, d)` representation, so I had to derive it myself. – Stefan Zobel Sep 05 '16 at 14:06