10

Before replacing a lot of my "old" for loops with range based for loops, I ran some test with visual studio 2013:

std::vector<int> numbers;

for (int i = 0; i < 50; ++i) numbers.push_back(i);

int sum = 0;

//vectorization
for (auto number = numbers.begin(); number != numbers.end(); ++number) sum += *number;

//vectorization
for (auto number = numbers.begin(); number != numbers.end(); ++number) {
    auto && ref = *number;
    sum += ref;
}

//definition of range based for loops from http://en.cppreference.com/w/cpp/language/range-for
//vectorization
for (auto __begin = numbers.begin(),
    __end = numbers.end();
    __begin != __end; ++__begin) {
    auto && ref = *__begin;
    sum += ref;
}

//no vectorization :(
for (auto number : numbers) sum += number;

//no vectorization :(
for (auto& number : numbers) sum += number;

//no vectorization :(
for (const auto& number : numbers) sum += number;

//no vectorization :(
for (auto&& number : numbers) sum += number;

printf("%f\n", sum);

looking at the disassembly, standard for loops were all vectorized:

00BFE9B0  vpaddd      xmm1,xmm1,xmmword ptr [eax]  
00BFE9B4  add         ecx,4  
00BFE9B7  add         eax,10h  
00BFE9BA  cmp         ecx,edx  
00BFE9BC  jne         main+140h (0BFE9B0h)  

but range based for loops were not :

00BFEAC6  add         esi,dword ptr [eax]  
00BFEAC8  lea         eax,[eax+4]  
00BFEACB  inc         ecx  
00BFEACC  cmp         ecx,edi  
00BFEACE  jne         main+256h (0BFEAC6h)  

Is there any reason why the compiler couldn't vectorize these loops ?

I really would like to use the new syntax, but loosing vectorization is too bad.

I just saw this question, so I tried the /Qvec-report:2 flag, giving another reason:

loop not vectorized due to reason '1200'

that is:

Loop contains loop-carried data dependences that prevent vectorization. Different iterations of the loop interfere with each other such that vectorizing the loop would produce wrong answers, and the auto-vectorizer cannot prove to itself that there are no such data dependences.

Is this the same bug ? (I also tried with the last vc++ compiler "Nov 2013 CTP")

Should I report it on MS connect too ?

edit

Du to comments, I did the same test with a raw int array instead of a vector, so no iterator class is involved, just raw pointers.

Now all loops are vectorized except the two "simulated range-based" loops.

Compiler says this is due to reason '501':

Induction variable is not local; or upper bound is not loop-invariant.

I don't get what's going on...

const size_t size = 50;
int numbers[size];

for (size_t i = 0; i < size; ++i) numbers[i] = i;

int sum = 0;

//vectorization
for (auto number = &numbers[0]; number != &numbers[0] + size; ++number) sum += *number;

//vectorization
for (auto number = &numbers[0]; number != &numbers[0] + size; ++number) {
    auto && ref = *number;
    sum += ref;
}

//definition of range based for loops from http://en.cppreference.com/w/cpp/language/range-for
//NO vectorization ?!
for (auto __begin = &numbers[0],
    __end = &numbers[0] + size;
    __begin != __end; ++__begin) {
    auto && ref = *__begin;
    sum += ref;
}

//NO vectorization ?!
for (auto __begin = &numbers[0],
    __end = &numbers[0] + size;
    __begin != __end; ++__begin) {
    auto && ref = *__begin;
    sum += ref;
}

//vectorization ?!
for (auto number : numbers) sum += number;

//vectorization ?!
for (auto& number : numbers) sum += number;

//vectorization ?!
for (const auto& number : numbers) sum += number;

//vectorization ?!
for (auto&& number : numbers) sum += number;

printf("%f\n", sum);
ThreeStarProgrammer57
  • 2,906
  • 2
  • 16
  • 24
  • 1
    It seems the compiler can't look through the iterator type. Try using your range-based `for` emulation using `&v[0]` and `&v[0] + v.size()` to confirm this suspicion. – Dietmar Kühl Nov 15 '14 at 01:17
  • @DietmarKühl If I have understood correctly, I tried : for (auto __begin = &numbers[0], __end = &numbers[0] + numbers.size(); __begin != __end; ++__begin) { auto && ref = *__begin; sum += ref; } But this also vectorize. – ThreeStarProgrammer57 Nov 15 '14 at 01:23
  • 3
    If the version using pointers vectorizes the loop, clearly the iterator wrapping the pointer upsets the compiler: the type returned from `std::vector::begin()` doesn't have to be `T*` (or `T const*`). It seems the compiler can't detect that this iterator is nothing more than a thin wrapper over a pointer. – Dietmar Kühl Nov 15 '14 at 01:35
  • @DietmarKühl I did the same with raw pointers and array, please see the edit. – ThreeStarProgrammer57 Nov 15 '14 at 02:03
  • If it isn't the iterator vs. pointer upsetting the compiler, it surely is something else. Possibly the compiler doesn't like the use of `end` instead of `__begin != __begin + size`. I don't have MSVC++ to check for myself... – Dietmar Kühl Nov 15 '14 at 02:06
  • I wonder how GCC and ICC behave. I don't have them in front of me to try it. – Mysticial Nov 15 '14 at 02:07
  • Complete list of compiler flags used? Can you double check that all of your "vectorization" and "NO vectorization" is correct? Your comment seems to disagree with some of the comments in the above code at first glance. Note that you did not bind the `range_expression` to an rvalue reference in your "definition of range for" samples. Your emulation is also wrong in a few ways: your `begin-expression` and `end-expression` do not line up perfectly with what range-for is supposed to do in a few cases. – Yakk - Adam Nevraumont Nov 15 '14 at 03:15
  • @Yakk My first comment was a test with std::vector using &numbers[0] instead of numbers.begin(), whereas the edit uses raw pointers and array. About "ranged for definition", i don't know what to do with "auto && __range = range_expression" cause "__range" is not used (cf link). Complete list of flags: /GS- /GL /analyze- /W3 /Zc:wchar_t /Zi /Gm- /Ox /Ob2 /Fd"Release\vc120.pdb" /fp:precise /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_LIB" /D "_MBCS" /fp:except- /errorReport:prompt /WX- /Zc:forScope /arch:AVX /Gd /Oy- /Oi /MD /Fa"Release\" /nologo /Fo"Release\" /Ot /Fp"Release\test.pch". – ThreeStarProgrammer57 Nov 16 '14 at 00:10
  • @realprog the begin expression and end expression uses the range expression. There are 3 possible begin/end expressions mandated involving members, `std::begin`, etc – Yakk - Adam Nevraumont Nov 16 '14 at 02:30
  • So, here comes the advice: never try to reason about the implementation (temper) of a compiler, especially with respect to optimization :) – Lingxi Nov 17 '14 at 15:47

1 Answers1

1

My guess could be that the range-based for loops do not offhand know that the object is a vector or an array or a linked list therefore the complier does not know beforehand vectorizes the loop. Range-based for loops are the equivalent of foreach loop in other languages. There might be a way to hint the complier to hint beforehand vectorizes the loop using a macro or a pragma or a complier setting. To check the please try using the code in other compliers and see what you get I would not be surprised if you get non-vectorized assembly code with the other compliers.