2

I performed a small test to determine behavior of accessing a vector of pointers vs vector of values. It turns out that for small memory blocks both perform equally well, however, for large memory blocks there is a significant difference.

What is the explanation for such behavior?

For the code below, performed on my pc, the difference for D=0 is about 35% and for D=10 it is unnoticeable.

int D = 0;
int K = 1 << (22 - D);
int J = 100 * (1 << D);

int sum = 0;
std::vector<int> a(K);
std::iota(a.begin(), a.end(), 0);
long start = clock();
for (int j = 0; j < J; ++j)
    for (int i = 0; i < a.size(); ++i)
        sum += a[i];
std::cout << double(clock() - start) / CLOCKS_PER_SEC << " " << sum << std::endl;

sum = 0;
std::vector<int*> b(a.size());
for (int i = 0; i < a.size(); ++i) b[i] = &a[i];
start = clock();
for (int j = 0; j < J; ++j)
    for (int i = 0; i < b.size(); ++i)
        sum += *b[i];
std::cout << double(clock() - start) / CLOCKS_PER_SEC << " " << sum << std::endl;
Marcin Król
  • 1,555
  • 2
  • 16
  • 31
  • Pointers leads to indirection which leads to extra memory access. As well as more space being used which of course may overflow caches. – Some programmer dude Mar 08 '15 at 19:04
  • The `sum += a[i];` causes undefined behaviour by overflowing the bounds of a signed `int`. (and the same bug occurs elsehwere). Even if you think this shouldn't be a problem, it's a good idea to do your testing with code that does not invoke undefined behaviour. This can be fixed by changing to `unsigned int sum = 0;` (or some other unsigned type, although the selection of type may affect the benchmark). – M.M Mar 08 '15 at 19:39
  • Benchmarks don't mean much if you're not at max optimization level. If using `-O3` you will probably need to take other measures to prevent your loops being optimized out. Making `sum` be `volatile` ought to do it, but it'd be good to double-check the generated assembly to make sure the timing code isn't being reorded. Also you could use C++11's [high resolution clock](http://stackoverflow.com/a/5524138/1505939) instead of `clock`. – M.M Mar 08 '15 at 19:45

1 Answers1

1

Getting data from global memory is slow, so the CPU has a small bit of really fast memory to help memory access keep up with the processor. When handling memory requests, your computer will try and speed up future requests to an single integer or pointer in memory by requesting a whole bunch of them around the location you requested and storing them in cache. Once that fast memory is full is has to get rid of its least favorite bit whenever something new is requested.

Your small problems may fit entirely or substantially in cache and so memory access is super fast. Large problems can't fit in this fast memory so you have a problem. The vector is stored as K consecutive memory locations. When you access a vector of int it loads the int and a handful of his nearby values these can be used right away. However, when you load an int* it loads a pointer to an actual value as well as several other pointers. This takes up some memory. Then, when you dereference with * it loads the actual value and possibly some actual values nearby. This takes up more memory. Not only do you have to perform more work, but you also are filling up memory faster. The actual increase in time will vary as it is highly dependent on the architecture, operation (in this case +), and memory speeds. Also, your compiler will work quite hard to minimize the delays.

Steve
  • 3,957
  • 2
  • 26
  • 50