Could an optimizing compiler remove all runtime costs from std::unique_ptr?

Question

Reading about std::unique_ptr at http://en.cppreference.com/w/cpp/memory/unique_ptr, my naive impression is that a smart enough compiler could replace correct uses of unique_ptr with bare pointers and just put in a delete when the unique_ptrs get destroyed. Is this actually the case? If so, do any of the mainstream optimizing compilers actually do this? If not, would it be possible to write something with some/all of unique_ptrs compile-time safety benefits that could be optimized to have no runtime cost (in space or time)?

Note to those (properly) worried about premature optimization: The answer here won't stop me from using std::unique_ptr, I'm just curious if it's a really awesome tool or just an awesome one.

EDIT 2013/07/21 20:07 EST:

OK, so I tested with the following program (please let me know if there's something wrong with this):

#include <climits>
#include <chrono>
#include <memory>
#include <iostream>

static const size_t iterations = 100;

int main (int argc, char ** argv) {
    std::chrono::steady_clock::rep smart[iterations];
    std::chrono::steady_clock::rep dumb[iterations];
    volatile int contents;
    for (size_t i = 0; i < iterations; i++) {
        auto start = std::chrono::steady_clock::now();
        {
            std::unique_ptr<int> smart_ptr(new int(5));
            for (unsigned int j = 0; j < UINT_MAX; j++)
                contents = *smart_ptr;
        }
        auto middle = std::chrono::steady_clock::now();
        {
            int *dumb_ptr = new int(10);
            try {
                for (unsigned int j = 0; j < UINT_MAX; j++)
                    contents = *dumb_ptr;
                delete dumb_ptr;
            } catch (...) {
                delete dumb_ptr;
                throw;
            }
        }
        auto end = std::chrono::steady_clock::now();
        smart[i] = (middle - start).count();
        dumb[i] = (end - middle).count();
    }
    std::chrono::steady_clock::rep smartAvg;
    std::chrono::steady_clock::rep dumbAvg;
    for (size_t i = 0; i < iterations; i++) {
        smartAvg += smart[i];
        dumbAvg += dumb[i];
    }
    smartAvg /= iterations;
    dumbAvg /= iterations;

    std::cerr << "Smart: " << smartAvg << " Dumb: " << dumbAvg << std::endl;
    return contents;
}

Compiling with g++ 4.7.3 using g++ --std=c++11 -O3 test.cc gave Smart: 1130859 Dumb: 1130005, which means the smart pointer is within 0.076% of the dumb pointer, which is almost surely noise.

What *else* would a compiler possibly do?! A unique pointer *is* just a single pointer, as far as the data content of the class is concerned. — Kerrek SB, Jul 21 '13 at 21:55
What runtime costs does a `unique_ptr` even have? As for space, `sizeof(myuniqueptr)` vs `sizeof(myptr)` are exactly the same for me, 8 bytes for `int`. — Rapptz, Jul 21 '13 at 21:56
Well, reading http://stackoverflow.com/questions/8138284/about-unique-ptr-performances suggests that some (early?) implementations weren't that way... — Shea Levy, Jul 21 '13 at 22:00
@SheaLevy: Those tests were performed on ***unoptimized builds***. Profiling without optimization is exceedingly pointless. — Nicol Bolas, Jul 21 '13 at 22:00
Specifically this answer to that question http://stackoverflow.com/a/12810087/636917 — Shea Levy, Jul 21 '13 at 22:01
@NicolBolas OK, fair enough. I just don't have a general sense right now of what can and can't be optimized, that's why I asked. — Shea Levy, Jul 21 '13 at 22:01
As a concession to my earlier comment, it's certainly *possible* to write a less efficient implementation (as provided in that linked question, namely where an offset adjustment is required for dereferencing), and a *stateful deleter* might cause such a less efficient implementation if the implementation doesn't check whether the deleter is empty (and apply empty-base optimization only when appropriate). — Kerrek SB, Jul 21 '13 at 22:05
In case the downvoters are still around, any chance you're willing to explain why? I'd like to avoid posting unwanted questions if possible. — Shea Levy, Jul 21 '13 at 22:08
It seems to me that as of GCC 4.8.1, `std::unique_ptr` is still implemented as an `std::tuple`. — Rapptz, Jul 21 '13 at 22:16
@SheaLevy: Answers go in the *answer section*, not in the question. That's one reason. — Nicol Bolas, Jul 22 '13 at 01:24
The code proves nothing. Volatile will tell the compiler to preserve the loops, but unique_ptr creation and dereferencing will be replaced by repeatedly moving a constant (5 in the case of the first loop) into *contents. You can see this in the generated assembly. So both loops will spend the same time, but not because unique_ptr is "cheap". — Cattus, Sep 08 '16 at 12:46

score 5 · Accepted Answer · answered Jul 21 '13 at 21:59

5

It would certainly be my expectation from any reasonably competent compiler, since it is just a wrapper around a simple pointer and a destructor that calls delete, so the machne code generated by the compiler for:

x *p = new X;
... do stuff with p. 
delete p;

and

unique_ptr<X> p(new X);
... do stuff with p;

will be exactly the same code.

answered Jul 21 '13 at 21:59

Mats Petersson

126,704
14
140
227

1

More like `x *p; try { p = new X; /* ... */ } catch(...) { delete p; throw; }`... – Kerrek SB Jul 21 '13 at 22:00
Does `unique_ptr` really need to `try/catch`? - surely, in both cases, a failure to allocate in `new` will just throw out of the whole function, either with the plain pointer `p` or smart pointer `p` not "constructed" (in a loose sense of constructed in the former case). I guess if the constructor or `X` throws, you need to worry about that... – Mats Petersson Jul 21 '13 at 22:07
2

It's about exceptions in the rest of your code! (So I should have said `x *p = new X; try { /* ... */ } catch(...) { delete p; throw; }`.) – Kerrek SB Jul 21 '13 at 22:09
Right, so I was hoping to not have any of those... ;) – Mats Petersson Jul 21 '13 at 22:15

score 4 · Answer 2 · answered Jul 22 '13 at 14:43

Strictly speaking, the answer is no.

Recall that unique_ptr is a template parametrized not only on the type of pointer but also on the type of the deleter. Its declaration is:

template <class T, class D = default_delete<T>> class unique_ptr;

In addition unique_ptr<T, D> contains not only a T* but also a D. The code below (which compiles on MSVC 2010 and GCC 4.8.1) illustrates the point:

#include <memory>

template <typename T>
struct deleter {
    char filler;
    void operator()(T* ptr) {}
};

int main() {
    static_assert(sizeof(int*) != sizeof(std::unique_ptr<int, deleter<int>>), "");
    return 0;
}

When you move a unique_ptr<T, D> the cost is not only that of copying the T* from source to target (as it would be with a raw pointer) since it must also copy/move a D.

It's true that smart implementations can detect if D is empty and has a copy/move constructor that doesn't do anything (this is the case of default_delete<T>) and, in such case, avoid the overhead of copying a D. In addition, it can save memory by not adding any extra byte for D.

unique_ptr's destructor must check whether the T* is null or not before calling the deleter. For defalt_delete<T> I believe, the optimizer might eliminate this test since it's OK to delete a null pointer.

However, there is one extra thing that std::unique_ptr<T, D>'s move constructor must do and T*'s doesn't. Since the ownership is passed from source to target, the source must be set to null. Similar arguments hold for assignments of unique_ptr.

Having said that, for the default_delete<T>, the overhead is so small that I believe will be very difficult to be detected by measurements.

I think it is optimal enough, as long as the size(default_delete) and * operator are zero overhead, unique_ptr is almost a perfect tool to guard the memory holding by raw pointer. — StereoMatching, Nov 03 '13 at 21:56

Could an optimizing compiler remove all runtime costs from std::unique_ptr?

2 Answers2

Linked