3

The code below shows my test cases. I compiled both with clang++ --std=c++11 -O2 and with g++ --std=c++11 -O2.


long long *ary = new long long[100000000]();
for (long long i = 0; i < 100000000; ++i)
    ary[i] = i;

std::vector<long long> vec(100000000, 0);
for (long long i = 0; i < 100000000; ++i)
    vec[i] = i;

For both I did tests with the initialization only, and then the initialization and the for loop. The results are below:

GCC:

  • Array initialization only: 0.182s
  • Array initialization and for loop: 0.250s
  • Vector initialization only: 0.169s
  • Vector initialization and for loop: 0.252

Clang:

  • Array initialization only: 0.004s
  • Array initialization and for loop: 0.004s
  • Vector initialization only: 0.150
  • Vector initialization and for loop: 0.240s

The gcc results coincide with the common belief that vectors are as fast as arrays. Moreover, the clang and gcc results for vector pretty much agree. However the clang results are ridiculous, with the array performing considerably faster. Anyone have any idea why this is?

JamesLens
  • 447
  • 6
  • 14

2 Answers2

8

A 25x speedup tells you that your code was optimized out. Since your code does nothing visible it is eligible to be deleted. Your benchmark is invalid.

usr
  • 168,620
  • 35
  • 240
  • 369
  • You are correct. I added std::cout << arr[50000000] << std::endl and the time became comparable. Thanks. – JamesLens Mar 11 '15 at 19:03
  • @JamesLens maybe this will not be enough some time in the future. The entire array/vector is a constant and the optimizer might eventually see through it. I'd do it differently. Initialize the entire vector to an external value such as something derived from the current time. Then, sum it and print the sum. Since the sum is dynamic it can never be deleted even with gods compiler. – usr Mar 11 '15 at 19:08
2

The difference here is how clang and gcc deal with optimizing away calls to new. The following code:

long long *ary = new long long[100000000]();
for (long long i = 0; i < 100000000; ++i)
    ary[i] = i;

clang will optimize it all away at -O2 optimization level(see it live):

xorl    %eax, %eax
retq

while gcc will not see it live:

 movl   $800000000, %edi
 call   operator new[](unsigned long)
 leaq   800000000(%rax), %rcx
 movq   %rax, %rdx
.L2:
 movq   $0, (%rdx)
 addq   $8, %rdx
 cmpq   %rcx, %rdx
 jne    .L2
 xorl   %edx, %edx

The question is this a valid optimization or not? One could argue that by the as-if rule this is not a valid optimization since new can result in observable behavior.

This was made a valid optimization in C++14 by proposal N3664: Clarifying Memory Allocation but clang has includes this optimization from before this period, see this answer.

Community
  • 1
  • 1
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740