C++ clang array much faster than clang vector and both gcc vector and array

Question

The code below shows my test cases. I compiled both with clang++ --std=c++11 -O2 and with g++ --std=c++11 -O2.

long long *ary = new long long[100000000]();
for (long long i = 0; i < 100000000; ++i)
    ary[i] = i;

std::vector<long long> vec(100000000, 0);
for (long long i = 0; i < 100000000; ++i)
    vec[i] = i;

For both I did tests with the initialization only, and then the initialization and the for loop. The results are below:

GCC:

Array initialization only: 0.182s
Array initialization and for loop: 0.250s
Vector initialization only: 0.169s
Vector initialization and for loop: 0.252

Clang:

Array initialization only: 0.004s
Array initialization and for loop: 0.004s
Vector initialization only: 0.150
Vector initialization and for loop: 0.240s

The gcc results coincide with the common belief that vectors are as fast as arrays. Moreover, the clang and gcc results for vector pretty much agree. However the clang results are ridiculous, with the array performing considerably faster. Anyone have any idea why this is?

perhaps related to [clang vs gcc - optimization including operator new](http://stackoverflow.com/q/25668420/1708801) — Shafik Yaghmour, Mar 11 '15 at 18:57
There is no way Clang is actually executing that code. Major suspicion it's being optimised out due to not having any observable effects. — Lightness Races in Orbit, Mar 11 '15 at 19:00

score 8 · Accepted Answer · answered Mar 11 '15 at 18:57

8

A 25x speedup tells you that your code was optimized out. Since your code does nothing visible it is eligible to be deleted. Your benchmark is invalid.

answered Mar 11 '15 at 18:57

usr

168,620
35
240
369

You are correct. I added std::cout << arr[50000000] << std::endl and the time became comparable. Thanks. – JamesLens Mar 11 '15 at 19:03
@JamesLens maybe this will not be enough some time in the future. The entire array/vector is a constant and the optimizer might eventually see through it. I'd do it differently. Initialize the entire vector to an external value such as something derived from the current time. Then, sum it and print the sum. Since the sum is dynamic it can never be deleted even with gods compiler. – usr Mar 11 '15 at 19:08

score 2 · Answer 2 · edited May 23 '17 at 12:28

The difference here is how clang and gcc deal with optimizing away calls to new. The following code:

long long *ary = new long long[100000000]();
for (long long i = 0; i < 100000000; ++i)
    ary[i] = i;

clang will optimize it all away at -O2 optimization level(see it live):

xorl    %eax, %eax
retq

while gcc will not see it live:

 movl   $800000000, %edi
 call   operator new[](unsigned long)
 leaq   800000000(%rax), %rcx
 movq   %rax, %rdx
.L2:
 movq   $0, (%rdx)
 addq   $8, %rdx
 cmpq   %rcx, %rdx
 jne    .L2
 xorl   %edx, %edx

The question is this a valid optimization or not? One could argue that by the as-if rule this is not a valid optimization since new can result in observable behavior.

This was made a valid optimization in C++14 by proposal N3664: Clarifying Memory Allocation but clang has includes this optimization from before this period, see this answer.

C++ clang array much faster than clang vector and both gcc vector and array

2 Answers2