1

I tried benchmarking how fast arrays were compared to vectors. However, I found out that no matter how much I increase the amount it loops, it always ends with 0 nanoseconds or 100 nanoseconds. So I got a little crazy and here's my loop:

#define MAX_VAL 15000
std::chrono::steady_clock::time_point start, end;

    std::cout << "Measuring array...\n";
    start = std::chrono::steady_clock::now();
    for (__int64 a = 0; a < LLONG_MAX; a++)
    {
        for (__int64 b = 0; b < LLONG_MAX; b++)
        {
            for (__int64 c = 0; c < LLONG_MAX; c++)
            {
                for (int i = 1; i < MAX_VAL - 1; i++)
                {
                    arr[i] += arr[i + 1];
                    arr[i - 1] -= arr[i];
                    arr[i] *= 2;
                }
            }
        }
    }
    end = std::chrono::steady_clock::now();

    std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << "\n";

When I printed out the values of the array before the loop, it actually started taking time to do things.

I'm compiling with o2 in release mode. Is this the reason that data not read in the code had its operations skipped?

  • 12
    Yes, if your code does nothing the compiler is free to change the code to just do nothing instead of spending a lot of time doing nothing. Both leave you at the same place in the end, with nothing. This is called the As-if rule. – NathanOliver Aug 29 '22 at 13:41
  • 4
    Yes, it's called the [as-if-rule](https://stackoverflow.com/questions/15718262/what-exactly-is-the-as-if-rule). – Lukas-T Aug 29 '22 at 13:47
  • 1
    Check the generated assembly and you'll likely see that the compiler just removed all of your code since doing so has no observable effect (as per the C++ standards rules for observable behaviour). – Jesper Juhl Aug 29 '22 at 13:47
  • A side note: the inner loop for the array is executed within 3 levels loops, each with LLONG_MAX (~2^63), so overall more than 2^180 iterations, which is a huge number. If the code was actually executed, you would never have seen the time measurement (in some reasonable time). – wohlstad Aug 29 '22 at 13:54
  • 1
    Side note : use std::uint64_t from , types starting with two __ are reserved for internal use by the compiler. And instead of using #define to declare a constant, use `static constexpr std::uint64_t max_val{15000ul}` to generate a typesafe constant (evaluated at compile time) – Pepijn Kramer Aug 29 '22 at 14:00
  • 1
    Try using the array after the benchmark. After this line: `end = std::chrono::steady_clock::now();` – drescherjm Aug 29 '22 at 14:03
  • 1
    *I tried benchmarking how fast arrays were compared to vectors* -- You will probably find little, to no difference, even if the code was not optimized away. – PaulMcKenzie Aug 29 '22 at 14:16
  • Print the assembly language of the `for` loop, with vector and with array. Compare the results. They should be very similar, depending on your optimization level. One or two instructions difference won't be noticeable by most people. – Thomas Matthews Aug 29 '22 at 15:53

2 Answers2

2

please take a look at gcc manual to find more about optimization flags in gcc and optimization levels , but to sum up the -O2 optimization will change your code and make a loop optimization in your code , I generated 2 assembly files using GCC , first one is without any optimization and the second is with optimization level set to -O2

with no optimization

take a look at the next assembly part of the file codes and the full file can be found at this link:

     # main.cpp:16:             for (__int64 c = 0; c < LLONG_MAX; c++)
    movl    $0, -48(%ebp)    #, c
    movl    $0, -44(%ebp)    #, c
L10:
 # main.cpp:16:             for (__int64 c = 0; c < LLONG_MAX; c++)
    movl    -48(%ebp), %eax  # c, tmp108
    xorl    $-1, %eax    #, tmp108
    movl    %eax, -60104(%ebp)   # tmp108, %sfp
    movl    -44(%ebp), %eax  # c, tmp109
    xorl    $2147483647, %eax    #, tmp109
    movl    %eax, -60100(%ebp)   # tmp109, %sfp
    movl    -60104(%ebp), %ecx   # %sfp, tmp107
    movl    -60100(%ebp), %ebx   # %sfp,
    movl    %ebx, %eax   #, tmp110
    orl %ecx, %eax   # tmp107, tmp110
    testl   %eax, %eax   # tmp110
    je  L7   #,
 # main.cpp:18:                 for (int i = 1; i < MAX_VAL - 1; i++)
    movl    $1, -52(%ebp)    #, i
L9:
 # main.cpp:18:                 for (int i = 1; i < MAX_VAL - 1; i++)
    cmpl    $14998, -52(%ebp)    #, i
    jg  L8   #,
 # main.cpp:20:                     arr[i] += arr[i + 1];
    movl    -52(%ebp), %eax  # i, tmp111
    movl    -60088(%ebp,%eax,4), %edx    # arr, _1
 # main.cpp:20:                     arr[i] += arr[i + 1];
    movl    -52(%ebp), %eax  # i, tmp112
    addl    $1, %eax     #, _2
 # main.cpp:20:                     arr[i] += arr[i + 1];
    movl    -60088(%ebp,%eax,4), %eax    # arr, _3
 # main.cpp:20:                     arr[i] += arr[i + 1];
    addl    %eax, %edx   # _3, _4
    movl    -52(%ebp), %eax  # i, tmp113
    movl    %edx, -60088(%ebp,%eax,4)    # _4, arr
 # main.cpp:21:                     arr[i - 1] -= arr[i];

you will find lines like # main.cpp:16: for (__int64 c = 0; c < LLONG_MAX; c++) indicating the equivalent line of that line in assembly code, also you will find alot of jmp commands and labels which express loops , by the way to generate this file, use the following command line in your cmd or terminal : g++ -fverbose-asm main.cpp -S main.s.


using -O2 optimization

take a look at the next assembly part of the file codes and the full file can be found at this link:

     # main.cpp:6: int main() {
    call    ___main  #
 # main.cpp:10:     std::cout << "Measuring array...\n";
    movl    $LC0, 4(%esp)    #,
    movl    $__ZSt4cout, (%esp)  #,
    call    __ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc     #
 # main.cpp:11:     start = std::chrono::steady_clock::now();
    call    __ZNSt6chrono3_V212steady_clock3nowEv    #
    movl    %eax, %esi   # tmp92, start
    movl    %edx, %edi   #, start
 # main.cpp:27:     end = std::chrono::steady_clock::now();
    call    __ZNSt6chrono3_V212steady_clock3nowEv    #
 # c:\mingw\lib\gcc\mingw32\9.2.0\include\c++\ostream:202:       { return _M_insert(__n); }
    movl    $__ZSt4cout, %ecx    #,
 # c:\mingw\lib\gcc\mingw32\9.2.0\include\c++\chrono:469:   return __cd(__cd(__lhs).count() - __cd(__rhs).count());
    subl    %esi, %eax   # start, tmp89
    sbbl    %edi, %edx   # start,
 # c:\mingw\lib\gcc\mingw32\9.2.0\include\c++\ostream:202:       { return _M_insert(__n); }
    movl    %eax, (%esp)     # tmp89,
    movl    %edx, 4(%esp)    #,
    call    __ZNSo9_M_insertIxEERSoT_    #
    subl    $8, %esp     #,
 # main.cpp:29:     std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << "\n";
    movl    $LC1, 4(%esp)    #,
    movl    %eax, (%esp)     # tmp94,
    call    __ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc     #
 # main.cpp:32: }

you will notice that it's a smaller file in terms of lines of code and it runs faster. because in the optimized version ,the compiler removed all of your loops , you will find only one jmp command in this file. so yes optimization affects your code. and you will not find any nested loops in your optimized version of this code , so yes the -O2 optimization does affect your code., by the way to generate this file, use the following command line in your cmd or terminal : g++ -O2 -fverbose-asm main.cpp -S main.s.


solving this problem

to prevent the compiler from optimizing any part of code , you can make this code a separate function and mark this function with the keyword volatile where volatile prevents the compiler from optimizing this part of code , also volatile can be used with variables not only function to protect them from optimization , refer to geeks for geeks for more examples and information , but for your code , to protect it from optimization , you can do this , but by the way it's not the best practice :

#include <iostream>
#include <chrono>
#include <climits>

volatile void func()
{
    #define MAX_VAL 15000
    std::chrono::steady_clock::time_point start, end;
    int arr[MAX_VAL]; 
    start = std::chrono::steady_clock::now();
    for (__int64 a = 0; a < LLONG_MAX; a++)
    {
        for (__int64 b = 0; b < LLONG_MAX; b++)
        {
            for (__int64 c = 0; c < LLONG_MAX; c++)
            {
                for (int i = 1; i < MAX_VAL - 1; i++)
                {
                    arr[i] += arr[i + 1];
                    arr[i - 1] -= arr[i];
                    arr[i] *= 2;
                }
            }
        }
    }
    end = std::chrono::steady_clock::now();

    std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << "\n";
}

int main() {
    std::cout << "Measuring array...\n";
    func();
    return 0;
}
abdo Salm
  • 1,678
  • 4
  • 12
  • 22
  • 1
    *refer to geeks for geeks* -- Please, not that site. It has wrong and misleading information on very many topics. – PaulMcKenzie Aug 29 '22 at 19:24
1

Aside from your problems with optimizations: There is no point in comparing arrays and vectors that way.

There are two reasons for performance diffenences:

  1. The Allocation is different. For arrays there is just one pointer increment which is virtually free. For vectors you have to request memory from the heap. The exact cost depends on much factors but it is much costlier than stack allocations. As your code is always doing one allocation you will propably see no effect.
  2. The Location is different. This might have affects on the cache locality and alignment. This is usually a minor effect visible in scenarios with entangled data structures and sizes above the CPU cache. There is no point in benchmarking this other than to tweak a specific implementation.

It makes sense to compare the penalty of vector when you constantly need to create and delete small objects. You will propably see no difference for read/write operations.

m2j
  • 123
  • 8