0

I have a question regarding the instantiation of std::vector. I compare instantiation of an std::vector and a dynamic allocation of an array of the same size. I was expecting that the instantiation of the std::vector would take a little bit longer but I have a huge difference performance.

For the array I have 53 us For the std::vector I have 4338 us

my code:

#include <chrono>
#include <vector>
#include <iostream>

int main() {
    unsigned int NbItem = 1000000 ;
    std::chrono::time_point<std::chrono::system_clock> start, middle ,end;
    start = std::chrono::system_clock::now() ;
    float * aMallocArea = (float *)calloc(sizeof(float)*NbItem,0) ;
    middle = std::chrono::system_clock::now() ;
    std::vector<float> aNewArea ;
    middle = std::chrono::system_clock::now() ;
    aNewArea.resize(NbItem) ;
    //float * aMallocArea2 = new float[NbItem];
    end = std::chrono::system_clock::now() ;
    std::chrono::duration<double> elapsed_middle = middle-start;
    std::chrono::duration<double> elapsed_end = end-middle;
    std::cout << "ElapsedTime CPU  = " << elapsed_middle.count()*1000000 << " (us) " << std::endl ;
    std::cout << "ElapsedTime CPU  = " << elapsed_end.count()*1000000 << " (us) " << std::endl ;
    free(aMallocArea) ;
    return 0;
}

Even if I create a vector of size 0 I have this difference. Do you know why I have such bad performance when I am instantiating a std::vector ? Do you know how to improve this (I tried to use compilation option -O3 but it does not give outstanding result).

Compilation line: g++ --std=c++11 -o test ./src/test.cpp

compilator version: g++ --version g++ (Debian 4.7.2-5) 4.7.2 Copyright (C) 2012 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Cœur
  • 37,241
  • 25
  • 195
  • 267
clam37
  • 143
  • 9
  • The first time instantiation of `std::vector` may incur overhead. But what about subsequent instantiations? – Bathsheba Dec 10 '14 at 10:53
  • Please provide your compipler version and compilation command line. – Angew is no longer proud of SO Dec 10 '14 at 10:57
  • 1
    `calloc(sizeof(float)*NbItem,0)` This is a bit weird, allocating 0 size objects. Not that it is the issue, but I would try allocating at least of size 1 and see if it behaves any differently. – DumbCoder Dec 10 '14 at 11:09
  • Possible duplicate: http://stackoverflow.com/questions/3664272/stdvector-is-so-much-slower-than-plain-arrays?rq=1 – nchen24 Dec 10 '14 at 11:10
  • 2
    Passing a size of 0 to `calloc` is implementation defined and makes the whole comparison completely worthless since it might just return `Null` without doing any allocation. See: http://en.cppreference.com/w/cpp/memory/c/calloc – AliciaBytes Dec 10 '14 at 11:13
  • @clam37 `calloc` actually seems to be the slowest, and `new` the fastest, with `vector.reserve()` being a bit of a cheating in my own tests: [using clang](http://coliru.stacked-crooked.com/a/0dbe7e797920fe06), [using gcc](http://coliru.stacked-crooked.com/a/6cc1b85cd639d268). Algorithms are only used to prevent the compiler from optimizing out the whole thing. – AliciaBytes Dec 10 '14 at 12:00
  • The time taken by the allocation function is only the tip of the iceberg. [This article](http://randomascii.wordpress.com/2014/12/10/hidden-costs-of-memory-allocation/) has a good discussion of some of the other factors. – Alan Stokes Dec 10 '14 at 14:30

1 Answers1

6

Do you realize that this:

float * aMallocArea = (float *)calloc(sizeof(float)*NbItem, 0);

means "Allocate sizeof(float)*NbItem items which have the size of zero"? This means that the call performs an allocation of zero bytes.

Even once you do correct this, the calloc form will be much faster in many cases. calloc implementations are capable of "reserving" a memory domain and returning a pointer. When you access the memory, the OS maps the virtual memory.

A vector on the other hand, actually goes through and initializes/constructs its elements. No implementation I know of checks to see that a) the type is POD, b) memory is zero, and c) that the allocator returns zeroed memory. So this initialization process can cost quite a bit, compared to calloc.

So the "C" version does next to nothing (if you fix your program), and the "C++" version goes through, initializes every element, and touches all the memory in the allocation. It will be much slower.

That is very rarely a good reason to favor the C version, even where performance matters. In practice, you should only allocate memory you actually need. Once you start using the memory for something, the times will even out (e.g. in the C version, it will take time to map the memory when you access it later on). If you were to create a second timed test which (say) computed the average of the arrays' elements, the C++ version would likely be faster on your implementation because the memory is already mapped and initialized, whereas the C version would perform mapping and initialization as you read the memory.

justin
  • 104,054
  • 14
  • 179
  • 226
  • 1
    For more information on the overhead when the memory is actually touched (especially when it is written): http://randomascii.wordpress.com/2014/12/10/hidden-costs-of-memory-allocation/ – Bruce Dawson Dec 11 '14 at 19:04
  • 1
    Thank you for helping me to understand my issue. I will try to implement the test you have suggested. – clam37 Dec 15 '14 at 14:09