The setup:
I have a function that uses SIMD intrinsics and would like to use it inside some constexpr functions.
For that, I need to make it constexpr. However, the SIMD intrinsics are not marked constexpr, and the constant evaluator of the compiler cannot handle them.
I tried replacing the SIMD intrinsics with a C++ constexpr implementation that does the same thing. The function became 3.5x slower at run-time, but I was able to use it at compile-time (yay?).
The problem:
How can I use this function inside constant expressions without slowing down my program at run-time?
Some ideas:
- Adding support for constant evaluating all SIMD intrinsics to the compiler constant expression evaluator, for all compilers: probably the right solution, but an impossible titanic task.
More pragmatic solutions would be to either:
- overload a function depending on whether it is being executed inside a constant expression (that is, provide a constexpr, and a non-constexpr version).
- or, somehow branch inside a constexpr function between the constexpr and run-time implementation (that is, detect in a branch whether the function is being executed inside a constant expression).
Anyhow, I am open to any suggestion that solves my problem.
Hints:
- @RMartinhoFernandes suggested in the Lounge to use
__builtin_constant_p
to detect whether the function arguments are all constant expressions, in which case the compiler would hopefully be at least attempting to evaluate the function at compile-time.
Failed attempts:
- @Jarod42 made the straight forward suggestion of just using two independent functions. I would briefly like to point out why this cannot work because it is not trivial. This solution assumes that at the call-site it is known whether the function will be constexpr evaluated or not. But this is not the case. Consider a constexpr function calling mine, which version of my function should it pick? It must pick the constexpr one in order for it to compile, but that "outer" constexpr function could still be evaluated at run-time. In that case, it would use the "slow" compile-time implementation, and hence, this approach does not solve the problem.