0

When I run the following program (with optimization on), the for loop with the std::vector takes about 0.04 seconds while the for loop with the array takes 0.0001 seconds.

#include <iostream>
#include <vector>
#include <chrono>

int main()
{
    int len = 800000;
    int* Data = new int[len];

    int arr[3] = { 255, 0, 0 };
    std::vector<int> vec = { 255, 0, 0 };

    auto start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < len; i++) {
        Data[i] = vec[0];
    }
    auto finish = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> elapsed = finish - start;
    std::cout << "The vector took " << elapsed.count() << "seconds\n";

    start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < len; i++) {
        Data[i] = arr[0];
    }
    finish = std::chrono::high_resolution_clock::now();
    elapsed = finish - start;
    std::cout << "The array took " << elapsed.count() << "seconds \n";

    char s;
    std::cin >> s;
    delete[] Data;
}

The code is a simplified version of a performance issue I was having while writing a raycaster. The len variable corresponds to how many times the loop in the original program needs to run (400 pixels * 400 pixels * 50 maximum render distance). For complicated reasons (perhaps that I don't fully understand how to use arrays) I have to use a vector rather than an array in the actual raycaster. However, as this program demonstrates, that would only give me 20 frames per second as opposed to the envied 10,000 frames per second that using an array would supposedly give me (obviously, this is just a simplified performance test). But regardless of how accurate those numbers are, I still want to boost my frame rate as much as possible. So, why is the vector performing so much slower than the array? Is there a way to speed it up? Thanks for your help. And if there's anything else I'm doing weirdly that might be affecting performance, please let me know. I didn't even know about optimization until researching an answer for this question, so if there are any more things like that which might boost the performance, please let me know (and I'd prefer if you explained where those settings are in the properties manager rather than command line since I don't yet know how to use the command line)

ElliotThomas
  • 305
  • 1
  • 3
  • 15
  • 9
    Which compiler, what optimization options? The most obvious problem with this code is that the vector loop runs first, so it pays the cost of page-faults for touching the newly-allocated `Data[]` for the first time. It's not even that big, so the 2nd loop probably gets L3 cache hits, too. Sanity-check your results by doing the tests in the other order. (Or just add a warm-up loop.) You also never use `Data[anything]` after the 2nd loop, so that memset might get optimized away. – Peter Cordes Feb 19 '20 at 05:36
  • 5
    Also you should do something opaque with the data in the array (like use random numbers and print out the unsigned sum) to prevent it being optimized out entirely. – Galik Feb 19 '20 at 05:36
  • I also wonder if `int tmp = vec[0];` would help the compiler know that the dynamically allocated memory for `Data[]` doesn't overlap with the dynamically allocated memory for `vec[]`, so it can compile into a vectorized memset without reloading every time. Nope, not a problem; MSVC and GCC both make the same fill loop. GCC uses a `movups` 16-byte store (if you use `-O3` to auto-vectorize), MSVC uses `rep stosd` which is a better choice for CPUs that might have 32 or 64-byte internal data paths. https://godbolt.org/z/s3O57Q. – Peter Cordes Feb 19 '20 at 05:44
  • 7
    Quickly tested it with [reversed order](http://coliru.stacked-crooked.com/a/6f3156a4a5e708f0) where the vector is faster and with a [warmup-loop](http://coliru.stacked-crooked.com/a/a7f70cdf3a270531) where both are equally fast. – Lukas-T Feb 19 '20 at 06:54
  • What exactly are you trying to measure? Is it `operator[]` speed? Your benchmark may not reproduce the issue your real code encounters. For example, the compiler may assume that `int* a` and `int* b` in your real code alias and generate slower code. – Maxim Egorushkin Feb 19 '20 at 10:33

2 Answers2

1

Let us observe how GCC optimizes this test program:

#include <vector>

int main()
{
    int len = 800000;
    int* Data = new int[len];

    int arr[3] = { 255, 0, 0 };
    std::vector<int> vec = { 255, 0, 0 };

    for (int i = 0; i < len; i++) {
        Data[i] = vec[0];
    }
    for (int i = 0; i < len; i++) {
        Data[i] = arr[0];
    }
    delete[] Data;
}

The compiler rightly notices that the vector is constant, and eliminates it. Exactly same code is generated for both loops. Therefore it should be irrelevant whether the first loop uses array or vector.

.L2:
    movups  XMMWORD PTR [rcx], xmm0
    add     rcx, 16
    cmp     rsi, rcx
    jne     .L2

What makes difference in your test program is the order of loops. The comments point out that when a third loop is added to the beginning, both loops take the same time.

I would expect that with a modern compiler accessing a vector would be approximately as fast as accessing an array, when optimization is enabled and debug is disabled. If there is an observable difference in your actual program, the problem lies somewhere else.

VLL
  • 9,634
  • 1
  • 29
  • 54
  • *accessing a vector would be as fast as accessing an array* generally yes for *repeated* accesses. But generally it's an extra layer of indirection. e.g. a function that gets a reference to a `std::vector` has to load the `.data()` pointer from memory, then use it. In the `int *data` case, you already have the data pointer directly. Or for an actual local array variable, there's not even any dynamic allocation at all, and the array contents are directly part of the stack frame just like 3 local vars. – Peter Cordes Feb 19 '20 at 12:29
-1

It is about caches. I dont know how it works detailed but Data[] is getting known better by cpu while it is used. If you reverse the order of calculation you can see 'vector is faster'.

But actually, you are testing neither vector nor array. Let's assume that vec[0] resides at 0x01 memory location, arr[0] resides at 0xf1. Only difference is reading a word from different single memory adresses. So you are testing how fast can I assign a value to elements of dynamically allocated array.

Note: std::chrono::high_resolution_clock might not be sufficient to measure ticks. It is better to use steady_clock as cppreference says.

calynr
  • 1,264
  • 1
  • 11
  • 23
  • Why downvoted ? – calynr Feb 19 '20 at 08:11
  • 2
    It's not just about caches, it's also about lazy allocation and page faults. But yes, the 2nd time is *so* fast because it also hits in cache. And yes, this is only testing writing to `Data[]`, not reading from either the vector or small array. But their address is irrelevant; and isn't not even being read inside the loop. The compiler is able to see that `Data[]` can't overlap with the dynamic allocation done by the vector, so it can hoist the load out of the loop and basically do a `std::fill`. I didn't downvote because some of this is kind of the right idea. – Peter Cordes Feb 19 '20 at 12:38
  • I mentioned the single address because I wanted to specify that we are not accessing non-contiguous, random adresses at each iteration, so we don't miss/hit cache. But if we were to compare the efficiency of two containers, there should be. I agree the idea was obscure but is not wrong. – calynr Feb 19 '20 at 13:53