Speeding up a do-while loop with loop unrolling

Question

I am trying to speed up code in a function that may be called many times over (maybe more than a million). The code has to do with setting two variables to random numbers and finding squared distance. My first idea for this is loop unrolling but I am a bit confused on how I should execute this because of the while condition that dictates it.

To speed up my program I have replaced the c++ built in rand() function with a custom one but I am stumped on how to make my program even faster.

do {
    x = customRand();
    y = customRand();
    distance = x * x + y * y; // euclidean square distance
} while (distance >= 1.0);

Loop unrolling today is a "micro optimization" that your compiler is more than capable of determining whether there is any benefit in that type of optimization in your case. Better to focus on writing clear, well documented code, and leave the micro-optimizations to the compiler (it will do a much better job 9999 times out of 10000, with much less chance of error. — David C. Rankin, May 25 '19 at 09:30
This really depends on what `customRand()`returns. 0 to 100 is a lot harder to get `distance < 1.0`, 0 to 1 though makes it "faster". — Hatted Rooster, May 25 '19 at 09:31
You would have several `if (x*x+y*y<1) break;` lines in the loop. But I doubt it would speed up the execution. You should compare the running times between the code optimized by compiler and your own optimization. — Dialecticus, May 25 '19 at 09:32
You should not use `rand`. I don't want to underestimate your programming skills but are you sure `` library is worse than your `customRand`? — Quimby, May 25 '19 at 09:33
The speed of your code is most probably dictated by the complexity of the random number generator. — mfnx, May 25 '19 at 09:58
Firstly, use `` and find a suitable generator that meets your need. Rather than keep generating random values until you get a pair that meets your requirement, transform the `x` and `y` values in a way that unambiguously meets your requirement. For example, if you choose a generator that is guaranteed to generate a value between `0` and `10`, simply adding `1` to both `x` and `y` will guarantee your condition is met. No need to calculate `distance`, no need to check it, and no need for the loop at all. — Peter, May 25 '19 at 09:58
Loop unrolling is an optimization only if it can amortize the loop termination condition. Clearly not practical in this case. You *can* optimize it, this loop can only take a substantial amount of time when the random numbers are small. No point making it slug to get to 1.0 with well-distributed random numbers. Just give it a leg up and start at, say, distance = 0.9; — Hans Passant, May 25 '19 at 10:35

Kit. · Answer 1 · 2019-05-25T10:45:13.717

1

You shouldn't expect to make your program faster by loop unrolling, because with a properly selected range ([-1, 1]) for the random number generator output your loop's body will be executed just once in more than 3/4 of the cases.

What you might want to do to help the compiler is to mark your while condition as "unlikely". For example, in GCC it would be:

#define unlikely(x) __builtin_expect((x),0)

do {
    x = customRand();
    y = customRand();
    distance = x * x + y * y; // euclidean square distance
} while (unlikely(distance >= 1.0));

Still, even this is unlikely to speed up your code in a measurable way.

If you were about guaranteed time and not speed, then for an uniform random distribution within a circle, with customRand() uniformly distributed in [-1, 1]

r = std::sqrt(std::abs(customRand()));
t = M_PI * customRand();
x = r * std::cos(t);
y = r * std::sin(t);

would do the trick.

edited May 25 '19 at 10:45

answered May 25 '19 at 10:39

Kit.

2,386
1
12
14

In this particular situation GCC generates exactly the same assembly code with and without `__builtin_expect`: https://godbolt.org/z/NWi-r9. – Evg May 25 '19 at 10:47
It depends on whether the compiler can see the `customRand()` body and what the body is. For example, [here](https://godbolt.org/z/Sl8jLQ) it generates a different code. But as I said, it is unlikely to noticeably speed up anything in this particular task. – Kit. May 25 '19 at 11:13
Agree. Your example is funny: the `do` loop degenerates into an infinite trivial one `A: jmp A;`, and you either jump into it after a single execution of `do` body or not. – Evg May 25 '19 at 11:42

score 0 · Answer 2 · answered May 25 '19 at 10:27

It smells like a math-based XY problem, but I will first focus on your question about loop speedup. I will still do some math though.

So basic speedup may use early continue if any of x or y is bigger or equal 1.0. There is no chance that x*x + y*y < 1.0 when one of x or y is bigger than 1.

do {
    float x = customRand();
    if (x >= 1.0) {
        continue;
    }
    float y = customRand();
    if (y >= 1.0) {
        continue;
    }
    distance = x * x + y * y; // euclidean square distance
} while (distance >= 1.0);

Unfortunately, it most likely screws up modern CPUs branch prediction mechanism and still may not give big speedup. Benchmarks, as always, would be required.

Also, as the fastest code is the one which does not run, let's discuss the potential XY problem. You seem to generate point which is bounded by circle with radius 1.0. In Cartesian system it is tough task, but it is really easy in polar system, where each point in disk is described with radius and angle and can be easily converted to Cartesian. I don't know what distribution do you expect over disk area, but see answers for this problem here: Generate a random point within a circle (uniformly)

PS. There are some valid comments about <random>, which as std library may lead to some speedup as well.

Speeding up a do-while loop with loop unrolling

2 Answers2