Efficient division of an int by intmax

Question

I have an integer of type uint32_t and would like to divide it by a maximum value of uint32_t and obtain the result as a float (in range 0..1).

Naturally, I can do the following:

float result = static_cast<float>(static_cast<double>(value) / static_cast<double>(std::numeric_limits<uint32_t>::max()))

This is however quite a lot of conversions on the way, and a the division itself may be expensive.

Is there a way to achieve the above operation faster, without division and excess type conversions? Or maybe I shouldn't worry because modern compilers are able to generate an efficient code already?

Edit: division by MAX+1, effectively giving me a float in range [0..1) would be fine too.

A bit more context:

I use the above transformation in a time-critical loop, with uint32_t being produced from a relatively fast random-number generator (such as pcg). I expect that the conversions/divisions from the above transformation may have some noticable, albeit not overwhelming, negative impact on the performance of my code.

By any chance, do you have to divide by `maxint` and not by `maxint + 1`? If `maxint+1` works, probably some bit magic can be done, since it's a division by `2^31`, which is essentially a subtraction from the exponent part of the number. — , Aug 20 '19 at 08:55
Do you need to convert the number to a float at all during the loop? Doesn't `value` represent the numerator of the fraction `value / std::numeric_limits::max()` ? — Richard Hodges, Aug 20 '19 at 09:15
Also note that on many architectures (especially Intel x86 and x64), storing `float` values is slower than storing `double` values - and the arithmetic operations are generally done in `double`, anyway. So maybe avoid `float` altogether? — Adrian Mole, Aug 20 '19 at 09:16
@CygnusX1 Note that using `float(value) / float(1ul << 32ul)` will most probably give you a range of `[0, 1]` instead of `[0, 1)` (which is what one normally wants for real distributions). `uniform_real_distribution` gives you `[0, 1)` (unless you use an old buggy version of `g++` or clang++`). — Ted Lyngmo, Aug 21 '19 at 05:58
I tried to compare several different solutions and on quick-bech, no one was faster than your original appoach: http://quick-bench.com/724l2j52SWB1Uh1tIzjHPAlknl4. — Daniel Langr, Aug 21 '19 at 07:27

Ted Lyngmo · Answer 1 · 2019-08-23T23:21:51.173

This sounds like a job for:

std::uniform_real_distribution<float> dist(0.f, 1.f);

I would trust that to give you an unbiased conversion to float in the range [0, 1) as efficiently as possible. If you want the range to be [0, 1] you could use this:

std::uniform_real_distribution<float> dist(0.f, std::nextafter(1.f, 2.f))

Here's an example with two instances of a not-so-random number generator that generates min and max for uint32_t:

#include <iostream>
#include <limits>
#include <random>

struct ui32gen {
    constexpr ui32gen(uint32_t x) : value(x) {}
    uint32_t operator()() { return value; }
    static constexpr uint32_t min() { return 0; }
    static constexpr uint32_t max() { return std::numeric_limits<uint32_t>::max(); }
    uint32_t value;
};

int main() {
    ui32gen min(ui32gen::min());
    ui32gen max(ui32gen::max());

    std::uniform_real_distribution<float> dist(0.f, 1.f);

    std::cout << dist(min) << "\n";
    std::cout << dist(max) << "\n";
}

Output:

0
1

Is there a way to achieve the operation faster, without division and excess type conversions?

If you want to manually do something similar to what uniform_real_distribution does (but much faster, and slightly biased towards lower values), you can define a function like this:

// [0, 1)  the common range
inline float zero_to_one_exclusive(uint32_t value) {
    static const float f_mul =
        std::nextafter(1.f / float(std::numeric_limits<uint32_t>::max()), 0.f);

    return float(value) * f_mul;
}

It uses multiplication instead of division since that often is a bit faster (than your original suggestion) and only has one type conversion. Here's a comparison of division vs. multiplication.

If you really want the range to be [0, 1], you can do like below, which will also be slightly biased towards lower values compared to what std::uniform_real_distribution<float> dist(0.f, std::nextafter(1.f, 2.f)) would produce:

// [0, 1]  the not so common range
inline float zero_to_one_inclusive(uint32_t value) {
    static const float f_mul = 1.f/float(std::numeric_limits<uint32_t>::max());

    return float(value) * f_mul;
}

Here's a benchmark comparing uniform_real_distribution to zero_to_one_exclusive and zero_to_one_inclusive.

@MSalters The idea was for OP to replace the use of `ui32gen` in the example with the real RNG (such as a PCG) he/she mentioned, so perhaps it wasn't that creative after all :-) — Ted Lyngmo, Aug 20 '19 at 23:09
This is clever, but I don't think `std::uniform_real_distribution` has to work this way. AFAIK it's allowed to use a more clever mapping, have some state, or even invoke `ui32gen::operator()` multiple times. — HolyBlackCat, Aug 21 '19 at 06:39
@HolyBlackCat If it did multiple calls to `operator()` it would surprise me. That would slow down random number generation considerably. It may use some clever mapping but afaik it's supposed to be a linear transformation. I've concluded by observation (not by looking at the code) that it matches the `zero_to_one_exclusive` function I just added in both `g++` and `clang++`. — Ted Lyngmo, Aug 21 '19 at 07:05
I don't mind downvotes if I can get a hint why. I.m.o. this answers OP:s question. `uniform_real_distribution` is designed to be used with RNG:s (like the PCG). I also gave faster, but not as accurate, alternatives for both the usual half-open interval and the closed interval. — Ted Lyngmo, Aug 21 '19 at 19:13

463035818_is_not_an_ai · Answer 2 · 2019-08-20T08:55:52.873

2

Two of the casts are superfluous. You dont need to cast to float when anyhow you assign to a float. Also it is sufficient to cast one of the operands to avoid integer arithmetics. So we are left with

float result = static_cast<double>(value) / std::numeric_limits<int>::max();

This last cast you cannot avoid (otherwise you would get integer arithmetics).

Or maybe I shouldn't worry because modern compilers are able to generate an efficient code already?

Definitely a yes and no! Yes, trust the compiler that it knows best to optimize code and write for readability first. And no, dont blindy trust. Look at the output of the compiler. Compare different versions and measure them.

Is there a way to achieve the above operation faster, without division [...] ?

Probably yes. Dividing by std::numeric_limits<int>::max() is so special, that I wouldn't be too surprised if the compiler comes with some tricks. My first approach would again be to look at the output of the compiler and maybe compare different compilers. Only if the compilers output turns out to be suboptimal I'd bother to enter some manual bit-fiddling.

For further reading this might be of interest: How expensive is it to convert between int and double? . TL;DR: it actually depends on the hardware.

edited Aug 20 '19 at 08:55

answered Aug 20 '19 at 08:52

463035818_is_not_an_ai

109,796
11
89
185

2

Multiplying by the reciprocal might be considered, though I'd suspect a smart compiler to already do this for a constant divisor. – schnaader Aug 20 '19 at 08:59
3

`std::numeric_limits::max()` is presumably `(1 << 32) - 1` (one less than a power of two) and probably has no good way to be simplified in floating point arithmetic. However, `(1 << 32)` as divisor is a plain power of two and thus has an exact inverse in floating point, so you can do a multiplication instead of a division. – Max Langhof Aug 20 '19 at 09:00
2

@schnaader That won't be considered unless you give up the exactness guarantees that the arithmetic operators normally have: https://godbolt.org/z/N9Jmnj – Max Langhof Aug 20 '19 at 09:00
The two casts you removed from the code, they are still performed right? I included them just to be more explicit in the conversions needed. – CygnusX1 Aug 20 '19 at 20:31

score 2 · Answer 3 · answered Aug 20 '19 at 09:34

If performance were a real concern I think I'd be inclined to represent this 'integer that is really a fraction' in its own class and perform any conversion only where necessary.

For example:

#include <iostream>
#include <cstdint>
#include <limits>

struct fraction
{
    using value_type = std::uint32_t;

    constexpr explicit fraction(value_type num = 0) : numerator_(num) {}

    static constexpr auto denominator() -> value_type { return std::numeric_limits<value_type>::max(); }

    constexpr auto numerator() const -> value_type { return numerator_; }

    constexpr auto as_double() const -> double {
        return double(numerator()) / denominator();
    }

    constexpr auto as_float() const -> float {
        return float(as_double());
    }

private:

    value_type numerator_;
};

auto generate() -> std::uint32_t;

int main()
{
    auto frac = fraction(generate());

    // use/manipulate/display frac here ...

    // ... and finally convert to double/float if necessary

    std::cout << frac.as_double() << std::endl;
}

However if you look at code gen on godbolt you'll see that the CPU's floating point instructions take care of the conversion. I'd be inclined to measure performance before you run the risk of wasting time on early optimisation.

Efficient division of an int by intmax

3 Answers3