5

I have tried to allocate space for 10^7 integers in heap and stack memory to see which one is faster. Obviously allocating in heap-memory was much faster but I don't understand the reason.

#include <bits/stdc++.h>
#include <chrono>

using namespace std;
using namespace std::chrono;

int main()
{
  high_resolution_clock::time_point t1 = high_resolution_clock::now();

  int *p = new int[1e7];

  high_resolution_clock::time_point t2 = high_resolution_clock::now();
  auto duration = duration_cast<microseconds>( t2 - t1 ).count();
  cout << duration / 1e6 << "\n"; // 5e-06



  t1 = high_resolution_clock::now();

  vector<int> v(1e7);

  t2 = high_resolution_clock::now();
  duration = duration_cast<microseconds>( t2 - t1 ).count();
  cout << duration / 1e6 << "\n"; // 0.112284

  return 0;
}
mrflash818
  • 930
  • 13
  • 24
Omar Yasser
  • 135
  • 1
  • 5
  • 17
    Where do you think you allocated space for 10^7 integers on the stack? – Sneftel Jul 30 '19 at 14:20
  • https://stackoverflow.com/questions/24057331/is-accessing-data-in-the-heap-faster-than-from-the-stack – M. Suleiman Jul 30 '19 at 14:21
  • 3
    [Why one shouldn't include bits/stdc++.h](https://stackoverflow.com/questions/31816095/why-should-i-not-include-bits-stdc-h)... – Aconcagua Jul 30 '19 at 14:23
  • 1
    [About using namespace std](https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice)... – Aconcagua Jul 30 '19 at 14:24
  • At least in C, heap memory allocation can also "lie" and report a successfully allocation of huge amounts of memory. However, that memory isn't actually reserved by the OS until you try to write to it. – Reticulated Spline Jul 30 '19 at 14:24
  • I'm trying to find the dupe for this. What it boils down to is `int *p = new int[1e7];` doesn't actually allocate any memory until you start to use it – NathanOliver Jul 30 '19 at 14:27
  • 2
    @ReticulatedSpline *However, that memory isn't actually reserved by the OS until you try to write to it.* That's a "feature" of Linux (and a few other OSes) which can be disabled if you value system stability. – Andrew Henle Jul 30 '19 at 14:27
  • 2
    `std::vector` allocates its data on the heap as well! Only a few member variables, including a pointer, actually reside on the stack; try `sizeof(std::vector)`, giving the number of bytes that really are allocated on stack... – Aconcagua Jul 30 '19 at 14:27
  • "_Obviously_ allocating in heap-memory was much faster but _I don't understand the reason_" - then it's not so obvious, is it? The thing is, the vector actually stores all the data in the heap too. You can't see the pointers it uses to refer to the allocated region because they're hidden by the implementation, but they're there. Creating the vector is slower than doing `new ...` because it does more stuff under the hood than merely memory allocation. – ForceBru Jul 30 '19 at 14:28
  • For comparison: You might add [array initialisation](https://stackoverflow.com/a/2204380/1312382) to your code, the difference might disappear then... – Aconcagua Jul 30 '19 at 14:35

4 Answers4

17

new int[1e7] allocates space for 1e7 int values and doesn't initialize them.

vector<int> v(1e7); creates a vector<int> object on the stack, and the constructor for that object allocates space for 1e7 int values on the heap. It also initializes each int value to 0.

The difference in speed is because of the initialization.

To compare the speed of stack allocation you need to allocate an array on the stack:

int data[1e7];

but be warned: there's a good chance that this will fail because the stack isn't large enough for that big an array.

Pete Becker
  • 74,985
  • 8
  • 76
  • 165
7

I am only a beginner, but let me give what I understand mainly to test myself.

In

int *p = new int[1e7];

you are allocating consecutive memory for 10 million integers on the heap.

In

vector<int> v(1e7);

you are allocating on the stack memory for a vector<int> object. Among the members of that object there is a pointer to an int[1e7] on the heap, that is also allocated. Moreover, all the values in it are initialized with the value of int() (with 0s). See constructor (2) of std::vector.

embeddedPy
  • 189
  • 6
6

Other answers pointed out that there is at least a "hidden" initialization in vector constructor.

But your example has another problem: maybe it does not even measure what you think it does. Benchmarking unoptimized code in C++ is pretty much meaningless and properly timing optimized code is hard.

Let's take a look at your (modified for readability) example compiled by Clang with -O3 optimization level: godbolt link.

double test1() {
  high_resolution_clock::time_point t1 = high_resolution_clock::now();

  int *p = new int[1e7];

  high_resolution_clock::time_point t2 = high_resolution_clock::now();
  auto duration = duration_cast<microseconds>( t2 - t1 ).count();
  return duration / 1e6; // 5e-06
}

compiled to:

test1():                              # @test1()
        push    rbx
        call    std::chrono::_V2::system_clock::now()
        mov     rbx, rax
        call    std::chrono::_V2::system_clock::now()
        sub     rax, rbx
        movabs  rcx, 2361183241434822607
        imul    rcx
        mov     rax, rdx
        shr     rax, 63
        sar     rdx, 7
        add     rdx, rax
        cvtsi2sd        xmm0, rdx
        divsd   xmm0, qword ptr [rip + .LCPI0_0]
        pop     rbx
        ret
.LCPI1_0:
        .quad   4696837146684686336     # double 1.0E+6

First part does not even call operator new! Compiler saw through your program and realized that you never used allocated array so it removed allocation from resulting executable.

So first part of your program does not allocate array on heap at all when compiled with such settings making measurements meaningless.

I recommend to read about benchmarking and use specialized micro benchmark frameworks to make such tests. Take a look at Google Benchmark (and online QuickBench) and it's documentation.

Serikov
  • 1,159
  • 8
  • 15
  • I like your answer here, it makes an important point--I think you say "Benchmarking unoptimized code in C++ is pretty much meaningless" because in the end you will _probably_ allow the compiler to optimize the code. It might help to add that if that's what is meant. It seems to me if you never intend to optimize it, then benchmarking that might be what you want. – villaa Jul 30 '19 at 17:59
0

I would like to note that the stack allocation takes absolutely no time at the run time; all the work is done by compiler. The comparison is meaningless regardless of optimization.

Vlad Feinstein
  • 10,960
  • 1
  • 12
  • 27