Why make_shared slower than shared_ptr( new T)

Question

In each article it is written that make_shared is more efficient, than shared_ptr<T>(new T), because of one memory allocation not two. But I try this code:

#include <cstdio>
#include <ctime>

#include <memory>
#include <vector>

static const size_t N = 1L << 25;

int main(void) {

    clock_t start = clock();
        for ( size_t rcx = 0; rcx < N; rcx++ ) {
            auto tmp = std::shared_ptr<std::vector<size_t>>( new std::vector<size_t>( 1024 ) );
        }
    clock_t end = clock();
    printf("shared_ptr with new: %lf\n", ((double)end - start) / CLOCKS_PER_SEC);

    start = clock();
        for ( size_t rcx = 0; rcx < N; rcx++ ) {
            auto tmp = std::make_shared<std::vector<size_t>>( 1024 );
        }
    end = clock();
    printf("make_shared: %lf\n", ((double)end - start) / CLOCKS_PER_SEC);

    return 0;
}

compile with:

g++ --std=c++14 -O2 test.cpp -o test

and got this result:

shared_ptr with new: 10.502945

make_shared: 18.581738

Same for boost::shared_ptr:

shared_ptr with new: 10.778537

make_shared: 18.962444

This question has answer about LLVM's libc++ is broken, but I use libstdc++ from GNU. So, why is make_shared slower?

P.S. With -O3 optimization got this result:

shared_ptr with new: 5.482464

make_shared: 4.249722

same for boost::shared_ptr.

I run your code and `make_shared` was faster with both `-O2` and `-O3` on a Linux server. What is your architecture, OS, and version of GCC, libstdc++, and glibc? — Daniel Langr, Nov 26 '19 at 07:12
With msvc `make_shared` is also faster. I get `shared_ptr with new: 12.624000 make_shared: 10.969000` release/x64/cl version 19.23.28106.4 --- with release/x86 it's even faster: `shared_ptr with new: 10.610000 make_shared: 7.620000` — Lukas-T, Nov 26 '19 at 07:15
@Zefick And what _does_ it mean in those cases in which it doesn't mean "faster"? — nada, Nov 26 '19 at 07:16
@nada For instance, that it consumes less memory resources. However, in the context of `make_shared`, it should be faster, since one allocation is avoided. — Daniel Langr, Nov 26 '19 at 07:20
@DanielsaysreinstateMonica Ubuntu 18.04, gcc 8.3, Linux 5.3.13 (with make-linux-fast-again kernel options), GLIBCXX_3.4.28. I got make_shared faster only with -O3. — BratSinot, Nov 26 '19 at 07:29
@DanielsaysreinstateMonica But with -march=native -mtune=native `shared_ptr with new: 19.046898` `make_shared: 10.471890`. CPU is Core i7-6700K. — BratSinot, Nov 26 '19 at 07:33
Shared with new is slower http://quick-bench.com/Ih7HfLwsYmhJpBW-Lu8VDrDO8Y4 — 273K, Nov 26 '19 at 07:39
Have you tried letting the `make_shared` loop run before the `new` loop? — xskxzr, Nov 26 '19 at 07:46
@xskxzr Before this code, i try with `time` from console and compile 2 separate program. — BratSinot, Nov 26 '19 at 08:18

J W · Answer 1 · 2019-11-26T08:28:42.163

0

Running a program on one computer and measure the execution time does not give any information about the program performance in general but only on your device under the actual conditions. There are dependencies on for example your os, compiler, other programs running on your device, library versions and even your user-name.
Since your program does access the main memory, it is possible that in your special case program is structured in a way so that it is more "fast" to access the memory. But it could of course be that if you terminate or start other software, change your user or the os restructures main-memory the "performance" looks total different.
So if you want credible data you should run the program at least on different devices and under different conditions. But I would recommend you to take a look at "profiler"-software.

edited Nov 26 '19 at 08:28

answered Nov 26 '19 at 08:22

J W

61
4

It has an influence on the actual structure of memory. – J W Nov 26 '19 at 08:51
most things you mention should have roughly same impact on both version that are being compared – 463035818_is_not_an_ai Nov 26 '19 at 08:58
Not necessarily, depending on the memory structure the processor needs to load different lines into its cache. So it is totally possible that you run the same program with different users or on another system and get different results. – J W Nov 26 '19 at 12:17

score 0 · Answer 2 · answered Nov 30 '19 at 19:38

On your platform, std::vector likely has an optimization in its allocator. Remember, you actually have three possible allocations here, not two. There's the shared control object, the vector object itself, and the space for the initial 1,024 size_t objects.

By using make_shared, you lose the opportunity to take advantage of that optimization because the shared control structure and the vector get created at the same time. This is an unusual edge case and you will probably find that you don't see this with other objects and other usage patterns.

The effect you're seeing is likely quite fragile. You may not see it with other libraries, on other platforms, with other objects, with other vector sizes, and so on.

You've found one weird edge case where the usual advice happens to produce slightly slower results.

In first time i try with std::string, not std::vector. And i use large string to disable Small String Optimization. — BratSinot, Dec 01 '19 at 19:03

Why make_shared slower than shared_ptr( new T)

2 Answers2