4

Suppose that I have a vector length calculation function, which has an additional inc parameter (this tells the distance between neighboring elements). A simple implementation would be:

float calcLength(const float *v, int size, int inc) {
    float l = 0;

    for (int i=0; i<size*inc; i += inc) {
        l += v[i]*v[i];
    }
    return sqrt(l);
}

Now, calcLength can be called with two kind of inc parameters: when inc is known at compile-time, and when it is not. I'd like to have an optimized calcLength version for common compile-time values of inc (like 1).

So, I'd have something like this:

template <int C>
struct Constant {
    static constexpr int value() {
        return C;
    }
};

struct Var {
    int v;

    constexpr Var(int p_v) : v(p_v) { }

    constexpr int value() const {
        return v;
    }
};

template <typename INC>
float calcLength(const float *v, int size, INC inc) {
        float l = 0;

        for (int i=0; i<size*inc.value(); i += inc.value()) {
            l += v[i]*v[i];
        }
        return sqrt(l);
    }
}

So, this can be used:

calcLength(v, size, Constant<1>()); // inc is a compile-time constant 1 here, calcLength can be vectorized

or

int inc = <some_value>;
calcLength(v, size, Var(inc)); // inc is a non-compile-time constant here, less possibilities of compiler optimization

My question is, would it be possible somehow to keep the original interface, and put Constant/Var in automatically, depending on the type (compile-time constant or not) of inc?

calcLength(v, size, 1); // this should end up calcLength(v, size, Constant<1>());
calcLength(v, size, inc); // this should end up calcLength(v, size, Var(int));

Note: this is a simple example. In my actual problem, I have several functions like calcLength, and they are large, I don't want the compiler to inline them.


Note2: I'm open to different approaches as well. Basically, I'd like to have a solution, which fulfills these:

  • the algorithm is specified once (most likely in a template function)
  • if I specify 1 as inc, a special function instantiated, and the code most likely gets vectorized
  • if inc is not a compile-time constant, a general function is called
  • otherwise (non-1 compile-time constant): doesn't matter which function is called
geza
  • 28,403
  • 6
  • 61
  • 135
  • AFAIK, unless the entire function is `constexpr` passing a compile time constant doesn't gain you anything. One thing you could do though is make the constant a non type template parameter. Then the value will be known at compile time and the compiler can optimize accordingly. – NathanOliver Mar 19 '19 at 19:40
  • @NathanOliver: as far as I see, this is what I'm doing in the question. With one difference: to allow both kind of `inc`, it has a type template parameter. But the outcome is the same. – geza Mar 19 '19 at 19:43
  • Are you looking for [if constexpr](https://en.cppreference.com/w/cpp/language/if) ? – Jesper Juhl Mar 19 '19 at 19:45
  • I mean a function like `template float calcLength(const float *v, int size) { use inc here as a compile time value }`. – NathanOliver Mar 19 '19 at 19:45
  • @NathanOliver: your suggestion has the same outcome as my solution, if `Constant` used as `inc`. – geza Mar 19 '19 at 19:46
  • Do you have any benchmarks conforming you conjuncture about "better optimizations"? I can image hypothetical cases if inc % 8 == 0 or inc % 16 ==0, but would not be sure it can vectorized much better. – Dmitry Mar 19 '19 at 19:48
  • 1
    @Dmitry: in my case, if `inc` is known at compile-time, it will be 1 99% of the time. And of course, it can be optimized much better. It has a huge speed difference. – geza Mar 19 '19 at 19:50
  • @geza I think "do you have benchmarks" means "post your benchmark so I can verify that my solution works". – anatolyg Mar 19 '19 at 21:55
  • @anatolyg: I have benchmarks, but not for this simple case. But actually, this question doesn't need a benchmark. It's more of a language question. Basically, I'd like to have a function, which can be automatically compiled for compile-time constant. My code works, but I dislike the manual `Constant`/`Var` specification. I'd like this to be automatic. – geza Mar 19 '19 at 22:19
  • 2
    So you have evidence that this approach is actually better on your architecture than just making the function inline and letting the optimizer worry about it? That would be good to explain or at least mention in the question. – aschepler Mar 19 '19 at 22:41
  • @aschepler: I mentioned in the question that I have several functions, and they are huge. I'm absolutely sure that the compiler won't inline them because of their size. I'd need to use some `forceinline` feature. But I don't want to, because compiled code-size will be much bigger. It would be a mistake to inline these functions on any current architecture. – geza Mar 19 '19 at 23:18
  • @JesperJuhl: I don't know how I can use `if constexpr` in my problem, so probably not. – geza Mar 19 '19 at 23:20
  • But if `inc` isn't a compile-time constant but with value `1`, which function should be called? It's OK, for you, if it's called the `Constant<1>` version? – max66 Mar 20 '19 at 01:10
  • @max66: basically it's never the case. But I think I see why you ask this: I could add a little inline wrapper function, which checks for 1. I don't really like this solution, because it adds an unnecessary if for the not-compile-time constant case. If that's the only solution, I'll keep using `Constant`/`Var`. – geza Mar 20 '19 at 10:02
  • [std::integral_constant](https://en.cppreference.com/w/cpp/types/integral_constant), but It doesn't seem to have been effective: https://godbolt.org/z/YuSK0L – Mooing Duck Apr 12 '19 at 17:49
  • Now that I enabled optimizations, both versions got calculated at compile time: https://godbolt.org/z/-s1xNS – Mooing Duck Apr 12 '19 at 17:55
  • If you don't mind portability too much, `__builtin_constant_p` is a great tool for this kind of optimization. – Marc Glisse Apr 12 '19 at 18:06
  • If you are using GCC, as @MarcGlisse suggested, [__builtin_constant_p](https://godbolt.org/z/uuPutd) would work but if conditions are still runtime one in this example... – Hiroki Apr 12 '19 at 18:41

3 Answers3

2

If the goal here is simply to optimize, rather than enable use in a compile-time context, you can give the compiler hints about your intent:

static float calcLength_inner(const float *v, int size, int inc) {
    float l = 0;

    for (int i=0; i<size*inc; i += inc) {
        l += v[i]*v[i];
    }
    return sqrt(l);
}

float calcLength(const float *v, int size, int inc) {
    if (inc == 1) {
        return calcLength_inner(v, size, inc);  // compiler knows inc == 1 here, and will optimize
    }
    else {
        return calcLength_inner(v, size, inc);
    }
}

From godbolt, you can see that calcLength_inner has been instantiated twice, both with and without the constant propagation.

This is a C trick (and is used extensively inside numpy), but you can write a simple wrapper to make it easier to use in c++:

// give the compiler a hint that it can optimize `f` with knowledge of `cond`
template<typename Func>
auto optimize_for(bool cond, Func&& f) {
    if (cond) {
        return std::forward<Func>(f)();
    }
    else {
        return std::forward<Func>(f)();
    }
}

float calcLength(const float *v, int size, int inc) {
    return optimize_for(inc == 1, [&]{
        float l = 0;
        for (int i=0; i<size*inc; i += inc) {
            l += v[i]*v[i];
        }
        return sqrt(l);
    });
}
Eric
  • 95,302
  • 53
  • 242
  • 374
0

C++ does not provide a way to detect whether a supplied function parameter is a constant expression or not, so you cannot automatically differentiate between supplied literals and runtime values.

If the parameter must be a function parameter, and you're not willing to change the way it is called in the two cases, then the only lever you have here is the type of the parameter: your suggestions for Constant<1>() vs Var(inc) are pretty good in that regard.

Anthony Williams
  • 66,628
  • 14
  • 133
  • 155
0

Option 1: Trust you compiler (aka do nothing)

Can compilers can do what you want without you lifting a finger (well, you need to enable optimized builds, but that goes without saying).

Compilers can create what are called "function clones", which do what you want. A clone function is a copy of a function used for constant propagation, aka the resulting assembly of a function called with constant arguments. I found little documentation about this feature, so it's up to you if you want to rely on it.

The compiler can inline this function altogether, potentially making your problem a non-problem (you can help it by defining it inline in a header, using lto and/or using compiler specific attributes like __attribute__((always_inline)))

Now, I am not preaching to let the compiler do its job. Although the compiler optimizations are amazing these times and the rule of thumb is to trust the optimizer, there are situation where you need to manually intervene. I am just saying to be aware of and to take it into consideration. Oh, and as always measure, measure, measure when it comes to performance, don't use your "I feel here I need to optimize" gut.

Option 2: Two overloads

float calcLength(const float *v, int size, int inc) {
    float l = 0;

    for (int i=0; i<size*inc; i += inc) {
        l += v[i]*v[i];
    }
    return sqrt(l);
}

template <int Inc>
float calcLength(const float *v, int size) {
    float l = 0;

    for (int i=0; i<size*inc; i += inc) {
        l += v[i]*v[i];
    }
    return sqrt(l);
}

The disadvantage here is code duplication, ofc. Also little care need to be taken at the call site:

calcLength(v, size, inc); // ok
calcLength<1>(v, size);   // ok
calcLength(v, size, 1);   // nope

Option 3: Your version

Your version is ok.

Community
  • 1
  • 1
bolov
  • 72,283
  • 15
  • 145
  • 224