2

I am trying to generate random doubles with a roughly equal probability for all representable finite values. I believe this would be something like a randomly signed exponential_distribution, a uniform_distribution won't work because the representable doubles are not uniformly distributed. I have this hacky code that seems to do what I want:

#include <cmath>
#include <iostream>
#include <random>

template < class GEN >
double double_distribution (GEN& generator)
{
    using gen_type = typename GEN::result_type;
    static_assert (sizeof (gen_type) == sizeof (double), "");
    union {gen_type g; double f;} result;
    do {
        result.g = generator();
    } while (!std::isfinite (result.f));
    return result.f;
}

int main() {
    std::mt19937_64 mt ((std::random_device())());
    for (int n = 0; n < 10; ++n) {
        std::cout << double_distribution (mt) << ' ';
    }
    std::cout << '\n';
}

but I'm wondering if there is a simpler way to approximate this using one of the STL's existing RandomNumberDistribution classes?

atb
  • 1,412
  • 1
  • 14
  • 30
  • Can't you get a uniform range from `[-1, 1]` and then multiply that by your max value? – NathanOliver Feb 10 '17 at 16:02
  • @NathanOliver: Maybe, could all possible doubles be generated this way? Intuitively it seems like limiting the range to `[-1,1]` would limit the entropy and multiplying by a constant would not add entropy, but maybe I'm missing something. – atb Feb 10 '17 at 16:10
  • @atb -- your intuition is right. There are fewer representable values in `[-1,1]` than there are in `[-DBL_MAX, DBL_MAX]`. For an existence proof, just consider that all the values in `[-1,1]` are also in `[-DBL_ MAX, DBL_MAX]`, and 2.0 is in the latter but not the former. Not to mention a few more values... – Pete Becker Feb 10 '17 at 16:17
  • You'll also need to filter out NaNs. – Pete Becker Feb 10 '17 at 16:19
  • @PeteBecker -- Yes, I agree. Also I believe `std::isfinite` rejects nans. – atb Feb 10 '17 at 16:21
  • So you want each representable real value to be equally likely to occur, right? I think the phrase "uniformly distributed" might be throwing some people off. I also question how useful this would be. – Keith Thompson Feb 10 '17 at 16:27
  • @atb -- re: "`std::isfinite` rejects nans" -- you're right. Never mind... – Pete Becker Feb 10 '17 at 16:29
  • @KeithThompson -- I changed the wording. As for usefulness, I want it for unit testing some math. – atb Feb 10 '17 at 16:31
  • 1
    I don't have it in front of me, but if I remember correctly, in the IEEE representation of floating-point values, all the finite values, non-NaN values are represented by contiguous bit values. That is, if you confine the RNG to the proper sub-range of 64-bit values you'll get only finite, non-NaN values. – Pete Becker Feb 10 '17 at 16:32
  • @PeteBecker -- interesting, maybe there is a way I can mask those bits to avoid discarding... – atb Feb 10 '17 at 17:20
  • Normal distribution should be fine for this purpose. – n. m. could be an AI Feb 10 '17 at 17:48
  • @n.m. -- yes I think maybe, but what standard deviation? – atb Feb 10 '17 at 18:45
  • You're going to need to be very careful in considering what you mean by "evenly distributed". The difference between two consecutive values of a floating point type (where, by "consecutive" I mean there are no values between them that can be also represented) is larger for values near `DBL_MAX` than for values approximately equal to `1`. There would be a difference between having equal probability for each "representable" value (you need to decide if that includes infinities, NaNs, etc if your floating point type represents them) versus uniform distribution. – Peter Feb 10 '17 at 21:01
  • @peter -- Yes, I do mean "equal probability for each representable value" other than NaN and Infs. I will update the question with that language, thanks. – atb Feb 10 '17 at 21:04
  • On the second thought, probably not. Maybe exponential or something. – n. m. could be an AI Feb 10 '17 at 23:11

2 Answers2

0

If you assume a 64-bit IEEE double, here's a way. (For a different representation of double, you should be able to modify this basic approach.)

The 64 bits are divided into 1 bit for sign, 11 bits for exponent, and 52 bits for the fraction. Special values like NaN and inf are represented by special values of the exponent bits. You have two basic choices from here:

  1. Generate a random number from the set of all possible 64 bit combinations (like a uniform on an appropriately sized integer type), and set the bits on your double from that. In this case, you'll sometimes draw a number that corresponds to, e.g., NaN. You'll either need to discard those and redraw or accept them, depending on whether you want NaN returned. (Sounds like you probably don't in your case.)

  2. Generate the bits for the sign and fractional part in one draw and then separately draw the exponent. If you do this, you should be able to modify your draw such that you don't pull NaN values - The special value seem to correspond to extreme values of the exponent, so this would just amount to changing the range of your draw on the integer, which I think will be supported easily by many packages. In this case, you don't have to "re-draw" to avoid the special values, but you probably end up with some "extra" bits on each draw since you'll probably end up pulling from a distribution for an integer type (with a multiple-of-8 number of bits), but using 53 for the sign+fraction and only 11 for the exponent. You could probably do something smart to economize these bits so that "thrown out" bits from one draw are used in the next draw. You'd have to think through to make sure that this doesn't introduce a dependence between your draws, but I think it should be ok.

If efficiency isn't a big deal for you, then #1 looks simpler to implement. If you need efficiency, then maybe #2 could be better, although probably at the expense of more complicated code.

Brick
  • 3,998
  • 8
  • 27
  • 47
  • I think (1) is basically what I'm doing already? I think (2) might also require discards as NaN and Inf are encoded in the IEEE exponent and mantissa bits, I was hoping they would be in separate bits that could be masked but they're not. – atb Feb 10 '17 at 18:42
  • I admit that I had trouble following your code, so my #1 may correspond to what you're doing. I think you're wrong about #2. The special values are "signed zero", inf, and NaN. Signed zero has exponent 0x0. Both inf and NaN have exponent 0x7FF, which is the largest exponent (all 11 bits all equal to 1). If you draw an integer greater than zero and less than 0x7FF, you should be good. https://en.wikipedia.org/wiki/Double-precision_floating-point_format – Brick Feb 10 '17 at 19:00
  • For #1, I was along the lines of bit manipulation not cast. After trying, I see that's not as easy as I was thinking. Here's a related question if you want to manipulate the bits on a double. Heuristically, I was thinking you'd draw an `unsigned u` via `random_device rd`, and you'd set `double d = 0 | u`. That's not quite right though because u and d have different number of bits. In making it better, I wanted to do bit manipulations on double, which is not allowed directly. That leads to the question linked (among others). http://stackoverflow.com/questions/37136667/changing-one-bit-in-double – Brick Feb 10 '17 at 19:25
-1

What you're looking for is std::uniform_real_distribution.

Note that this will - at least on Visual Studio - not work for the entire range from numeric_limits<double>::lowest() to numeric_limits<double>::max() because the calculation involves the distance between the lower and upper bounds of the range, which is obviously larger than double allows for.

I'd suggest you to rethink your approach anyway, because it's probably not actually what you want - a uniform distribution between all double values is almost exclusively going to generate values with an extremely large magnitude (almost all numbers you'll get are going to be something-something*10^300).

In any case, if you actually want to use a uniform distribution here's an example:

#include <random>
#include <limits>
#include <iostream>
#include <cmath>

int main()
{
    std::random_device rd;
    std::mt19937 gen(rd());
    // sqrt so distance between min and max fits into a double
    auto max = std::sqrt(std::numeric_limits<double>::max());
    auto min = -max;
    std::cout << "min: " << min << " max: " << max << std::endl;
    std::uniform_real_distribution<double> dis(min, max);
    for (int n = 0; n < 100; ++n) {
        std::cout << dis(gen) << '\n';
    }
    std::cout << std::endl;
}
Cubic
  • 14,902
  • 5
  • 47
  • 92
  • I don't think I want a uniform distribution as representable doubles are not distributed uniformly on the real number line, I think what I want is closer to an exponential distribution. – atb Feb 10 '17 at 16:19
  • 1
    @atb That's not really the main problem here; With a uniform distribution 99.999% of the numbers are going to be within 5 orders of magnitude of the largest numbers - so basically, you'll almost never get values smaller (in magnitude) than +-10^300. But yes, I imagine an exponential or normal distribution is probably more in line with what you're looking for. – Cubic Feb 10 '17 at 16:23