13

Here are free functions that do the same but in the first case the loop is not vectorized but in the other cases it is. Why is that?

#include <vector>

typedef std::vector<double> Vec;

void update(Vec& a, const Vec& b, double gamma) {
    const size_t K = a.size();
    for (size_t i = 0; i < K; ++i) { // not vectorized
        a[i] = b[i] * gamma - a[i];
    }
}

void update2(Vec& a, const Vec& b, double gamma) {
    for (size_t i = 0; i < a.size(); ++i) { // vectorized
        a[i] = b[i] * gamma - a[i];
    }
}

void update3(Vec& a, size_t K, const Vec& b, double gamma) {
    for (size_t i = 0; i < K; ++i) { // vectorized
        a[i] = b[i] * gamma - a[i];
    }
}

int main(int argc, const char* argv[]) {
    Vec a(argc), b;
    update(a, b, 0.5);
    update2(a, b, 0.5);
    update3(a, a.size(), b, 0.5);
    return 0;
}

Relevant messages from the compiler (VS2013):

1>  c:\home\dima\trws\trw_s-v1.3\trws\test\vector.cpp(7) : info C5002: loop not vectorized due to reason '1200'
1>  c:\home\dima\trws\trw_s-v1.3\trws\test\vector.cpp(13) : info C5001: loop vectorized
1>  c:\home\dima\trws\trw_s-v1.3\trws\test\vector.cpp(19) : info C5001: loop vectorized

From comment by @tony

Reason 1200: "Loop contains loop-carried data dependences that prevent vectorization. Different iterations of the loop interfere with each other such that vectorizing the loop would produce wrong answers, and the auto-vectorizer cannot prove to itself that there are no such data dependences." source

bolov
  • 72,283
  • 15
  • 145
  • 224
Dmitry
  • 146
  • 3
  • 2
    Try a different compiler? Others (gcc and clang) do vectorize all 3 functions. – Marc Glisse May 08 '15 at 21:35
  • 1
    What is "reason 1200" documented as? – Alan Stokes May 08 '15 at 22:05
  • 1
    Reason 1200: "Loop contains loop-carried data dependences that prevent vectorization. Different iterations of the loop interfere with each other such that vectorizing the loop would produce wrong answers, and the auto-vectorizer cannot prove to itself that there are no such data dependences." [source](https://msdn.microsoft.com/en-us/library/jj658585.aspx#BKMK_ReasonCode120x) – Tony May 08 '15 at 22:16
  • 1
    I'm surprised that any compiler is able to vectorize any of those at all. The compiler needs to prove that either `a` and `b` do not alias or that `&a[0] < &b[0]`. – Mysticial May 09 '15 at 07:52
  • 1
    @Mysticial Or much more easily, the compiler can generate both the sequential and vectorized versions, and have a runtime test checking if a and b overlap to dispatch to either version. – Marc Glisse May 10 '15 at 08:11
  • @Mysticial, from [sum-of-overlapping-arrays-auto-vectorization-and-restrict](https://stackoverflow.com/questions/23651055/sum-of-overlapping-arrays-auto-vectorization-and-restrict) GCC builds a version with and without overalap (which it can vectorize) and then when calling the function it checks for overlap first. Using `restrict` does not build the version with overlap and removes the check. – Z boson May 11 '15 at 07:28
  • All three loops seems sun-optimal to me because compiler assumes `a` and `b` aliases. For best performance, use restricted pointers to access data elements instead. – user3528438 Aug 13 '15 at 13:36
  • @user3528438 c++ does not have `restrict` – bolov Aug 13 '15 at 14:05
  • @bolov but MSVC have restricted keyword, although ideally a C99 implementation and a C++ wrapper is the best. – user3528438 Aug 13 '15 at 14:23

1 Answers1

2

I guess it's some deeply internal compiler implementation issue, like at what stage did auto-vectorizer "kick in" and what's the state of the internal representation of the code at that time. When I tried on MSVC2017, it worked more in line with what one would expect. It auto-vectorized update() and update3(), but not update2(), with the reason given 501 for line 14, which is documented as:

Induction variable is not local; or upper bound is not loop-invariant.

srdjan.veljkovic
  • 2,468
  • 16
  • 24