std::vector accesses memory almost twice as many times as raw pointer does - why?

Question

I have a rather basic question about the memory behavior of std::vector. I would like to understand why a simple loop through a vector is much slower than the same loop using a raw pointer.

This is code0:

#include<iostream>
#include<vector>
#include "utils.h" //contains Timer
using namespace std;

void ff(double * p, int n) {

  for (int i =0; i< n ; i++) p[i]+=i;

}

int main() {
  int n = 1e9;
  Timer t;  //comes from "utils.h"
  t.tic();     //start timer
  double * p = new double[n];
  ff(p, n);
  delete[] p;
  double tt = t.toc();     //stop timer
  cout << tt << endl;     // number of seconds recorded by t.
}

When I compile and run code0 I get the following output: 3.88309.

This is codeV:

#include<iostream>
#include<vector>
#include "utils.h"
using namespace std;

void ff(vector<double>& v, int n) {

  for (int i =0; i< n ; i++) v[i]+=i;

}

int main() {
  int n=1e9;
  Timer t;
  t.tic();
  vector<double> v(n);
  ff(v, n);
  double tt = t.toc();
  cout << tt << endl; 
}

When I compile and run codeV I get the following output: 5.25866.

As you can see, code0 and codeV are doing the same thing, but the first is implemented with pointer and the second with std::vector. However codeV is almost twice as slow as code0.

Let me remark that I compiled both codes without optimization flags and using g++.

Out of curiosity I ran valgrind and perf on code0 and codeV, and it turns out that:

1) codeV performs 455,378,126 cache references, while code0 only 185,640,714;

2) codeV has about 50% cache misses among its memory calls, while code0 less than 5%.

If compiled with optimization flags on, the difference in time become less marked, though those on memory are still noticeable. I would like to ask: what is going on ? Why is std::vector performing so much worse than a raw pointer on such a simple task?

Thank you for any insight on this issue!

Your code uses `std::vector` but your text says `std::array` - which is it? — Ken Y-N, Feb 05 '19 at 01:39
Your code isn't equivalent. To make it so, change it to `new double[n]();`. In fact, your first code has undefined behaviour because it reads an uninitialized value. — Kerrek SB, Feb 05 '19 at 01:40
You got undefined behavior by reading uninitialized values in your first code snippet. The vector version is properly initializing each double to `0` — Guillaume Racicot, Feb 05 '19 at 01:41
`Let me remark that I compiled both codes without optimization flags` Which really makes your comparison invalid. Does your vector implementation have iterator and bounds checking in a debug build? — Retired Ninja, Feb 05 '19 at 01:41
Indeed, a comparison without optimization is not a fair one. Many implementation has debug checks and abstraction are not inlined out in debug builds. — Guillaume Racicot, Feb 05 '19 at 01:43
Possible duplicate of [Is std::vector so much slower than plain arrays?](https://stackoverflow.com/questions/3664272/is-stdvector-so-much-slower-than-plain-arrays) — , Feb 05 '19 at 01:49
Why did you concentrate on performance instead of the output being different? If you see different output, investigate that first. — PaulMcKenzie, Feb 05 '19 at 02:11
Thanks for your comments.I mean vector and edited accordingly. Also using new double[]() the 'vector' version is slower than 'pointer' one. Unfortunately I can't answer your questions about bound-debug checking because I don't know what it is... in terminal I run: g++ -std=c++11 codeV.cpp -o codeV.out && ./codeV.out — fedeb, Feb 05 '19 at 02:56
Could bound checking be the reason why there are so many memory calls? — fedeb, Feb 05 '19 at 03:01
*in terminal I run: g++ -std=c++11 codeV.cpp -o codeV.o* -- This is meaningless without optimizations turned on. See the answer. — PaulMcKenzie, Feb 05 '19 at 03:17
If you [inspect](https://godbolt.org/z/zvipdX) the generated assembly with `-O3 -DNDEBUG` in godbolt (I inlined the call to avoid introducing template magic), well it looks like identical, so I guess something is wrong in your methodology. — Jack, Feb 05 '19 at 03:23

score 2 · Accepted Answer · answered Feb 05 '19 at 03:00

Without optimization, all of the C++ standard library will be slower than C code. Without optimization, all of C is slower than hand-written assembly. What do you expect?

In order to use std::vector without optimization, the compiler has to write out about four function calls for each write to a vector element, compared to the single array write in C.

Use at least -O2. -O3 -DNDEBUG is my preference. Use -flto and -fprofile-generate / -fprofile-use too. Look them up.

If you aren't writing C++ to be fast, why use it at all?

Will look them up and play with them. Thank you! – fedeb Feb 05 '19 at 03:21 — fedeb, Feb 05 '19 at 03:21

std::vector accesses memory almost twice as many times as raw pointer does - why?

1 Answers1