Smart pointer vs regular pointer for performance

Question

Is there an advantage or disadvantage to using smart pointers vs regular pointers regarding performance?

I am running the following code, compiled with VS2019 in release and debug.

These are the results for release:

Assign Ptr time       = 0.3285ms
Assign Smart ptr time = 0.101ms
Sum Ptr = 126756464
Sum Smart Ptr = 126756464
Sum Ptr time          = 0.2124ms
Sum Smart ptr time    = 0.2912ms

These are the results for debug:

Assign Ptr time       = 1.8149ms
Assign Smart ptr time = 15.8177ms
Sum Ptr = 126756464
Sum Smart Ptr = 126756464
Sum Ptr time          = 1.8392ms
Sum Smart ptr time    = 15.9617ms

Code

#include <iostream>
#include <cstdio>  // getchar
#include <chrono>

#define HEIGHT 1000
#define WIDTH  1000

int main(void)
{
    using std::chrono::high_resolution_clock;
    using std::chrono::duration_cast;
    using std::chrono::duration;
    using std::chrono::milliseconds;


    uint8_t matrix[HEIGHT * WIDTH];
    uint8_t* matrixPtr = new uint8_t[HEIGHT * WIDTH];
    std::unique_ptr<uint8_t[]> matrixSmartPtr = std::make_unique<uint8_t[]>(HEIGHT * WIDTH);
    

    int index = 0;
    for (int i = 0; i < HEIGHT; i++)
    {
        for (int j = 0; j < WIDTH; j++)
        {
            matrix[index] = rand() % 255;
            index++;
        }
    }

    index = 0;
    auto t1 = high_resolution_clock::now();
    for (int i = 0; i < HEIGHT; i++)
    {
        for (int j = 0; j < WIDTH; j++)
        {
            matrixPtr[index] = matrix[index];
            index++;
        }
    }
    auto t2 = high_resolution_clock::now();

    index = 0;
    auto t3 = high_resolution_clock::now();
    for (int i = 0; i < HEIGHT; i++)
    {
        for (int j = 0; j < WIDTH; j++)
        {
            matrixSmartPtr.get()[index] = matrix[index];
            index++;
        }
    }
    auto t4 = high_resolution_clock::now();


    /* Getting number of milliseconds as a double. */
    duration<double, std::milli> ms_assign_n = t2 - t1;
    duration<double, std::milli> ms_assign_s = t4 - t3;

    std::cout << "Assign Ptr time       = " << ms_assign_n.count() << "ms" << std::endl;
    std::cout << "Assign Smart ptr time = " << ms_assign_s.count() << "ms" << std::endl;

    int sumA = 0;
    index = 0;
    auto t5 = high_resolution_clock::now();
    for (int i = 0; i < HEIGHT; i++)
    {
        for (int j = 0; j < WIDTH; j++)
        {
            sumA += matrixPtr[index];
            index++;
        }
    }
    auto t6 = high_resolution_clock::now();


    std::cout << "Sum Ptr = " << sumA << std::endl;

    int sumB = 0;
    index = 0;
    auto t7 = high_resolution_clock::now();
    for (int i = 0; i < HEIGHT; i++)
    {
        for (int j = 0; j < WIDTH; j++)
        {
            sumB += matrixSmartPtr.get()[index];
            index++;
        }
    }
    auto t8 = high_resolution_clock::now();

    std::cout << "Sum Smart Ptr = " << sumB << std::endl;

    /* Getting number of milliseconds as a double. */
    duration<double, std::milli> ms_sum_n = t6 - t5;
    duration<double, std::milli> ms_sum_s = t8 - t7;

    std::cout << "Sum Ptr time          = " << ms_sum_n.count() << "ms" << std::endl;
    std::cout << "Sum Smart ptr time    = " << ms_sum_s.count() << "ms" << std::endl;
    
    delete[] matrixPtr;

    std::cout << "Press enter to finish" << std::endl;
    std::getchar(); // Avoid program from exiting

    return 0;
}

I don't understand why, in release, the assign is faster with smart pointers, but doing the sum the results are similar or even worse.

Why is the smart pointer way worse in debug mode?

If you look at the compiled output, you can see that the `unique_ptr` gets compiled out, and is just a regular pointer. Any difference you're seeing is just noise (in release mode) — ChrisMM, Apr 07 '21 at 16:22
[In rare cases](https://stackoverflow.com/q/58339165/2752075) `unique_ptr` will be slower. But it shouldn't worry you, since the safety it gives outweights the possible tiny performance impact. — HolyBlackCat, Apr 07 '21 at 16:29

ChrisMM · Accepted Answer · 2021-04-07T16:40:52.433

In release mode, what you're seeing is just noise. A unique_ptr is going to get compiled out. You can see this from the compiled assembly (g++ output from the assign shows):

Raw Pointer

        mov     edx, 1000000
        mov     rsi, rsp
        mov     rdi, r13
        mov     rbp, rax
        call    memcpy

Unique Pointer:

        mov     edx, 1000000
        mov     rsi, rsp
        mov     rdi, r12
        mov     r14, rax
        call    memcpy

Even the creation of the unique_ptr compiles to just:

call    operator new[](unsigned long)

And at the end, there's a call to

call    operator delete[](void*)

Note: as HolyBlackCat mentioned in a comment, there's rare cases where unique_ptr is slower.

score -1 · Answer 2 · answered Apr 07 '21 at 16:19

-1

Smart pointers have disadvantage vs pointers. You can see that for debug mode. In case of release mode the compiler makes optimization and stores value of matrixSmartPtr.get() in cache.

answered Apr 07 '21 at 16:19

Evgeny

1,072
6
6

3

You assume that the implementation is the same in debug and release mode.. Many compilers include extra code in debug mode to make debugging easier and bugs more obvious. eg a nullptr check. – Richard Critten Apr 07 '21 at 16:21
1

Measuring performance in debug mode is futile as the purpose of debug mode is to find bugs not to measure performance. – Richard Critten Apr 07 '21 at 17:06

Smart pointer vs regular pointer for performance

2 Answers2