How to transfer unique_ptr ownership efficiently through many layers of calls to async scenario

Question

Let me start with background context.

Often time, in server side programming, our function call may go async (i.e. to shift the work to another thread in callback because of IO). So the pattern is function A calls B, B calls C and eventually in many layer deep down in function Z, we may decide to go async.

Function A needs to transfer ownership managed by a unique_ptr by using std::move all the way to function Z. However, move unique_ptr is 40 nsec (vs 20 nsec passing shared_ptr by value or 1 nsec for passing raw pointer). For extremely perf sensitive server application, we may not want to do unique_ptr move through so many layers of functions. For similar reasons, we don't want to use shared_ptr (more importantly shared_ptr always messes up ownership).

To be clear, we can use raw pointers and manage the resource ourselves. We always have that option and it could ultimately be the best option. But for the sake of argument, I think there is another option that I want to get people's opinion on.

So the proposed pattern is to pass raw pointer from the unique_ptr all the way from function A to Z. At function Z where I need to go async, I create a new unique_ptr based on the passed raw pointer and move to the new thread (i.e. I can capture the raw pointer in lambda, and create the new unique_ptr in lambda's body and execute the lambda in new thread). At function A, if there is no exception thrown (i.e. the transfer succeeded), I will call release on unique_ptr so there is no double release. If there is exception, then I will not call release so the caller in A will free the unique_ptr's resource. This way, I avoided a chain of move of unique_ptr and still be able to. See example below.

void Foo1(MyObj* po)
{
  auto l =[po](){ auto u = std::unique_ptr<MyObj>(po); };
  std::thread t(l);
  t.join();
}

int main() 
{
  auto p1 = std::make_unique<MyObj>();
  Foo1(p1.get());
  std::cout<<"i am here\n";
  p1.release(); 
}

If you have a question to ask, you'd better add it before this Question gets closed for being "unclear what you're asking" — Jeremy Friesner, Apr 07 '21 at 22:03
_"... move unique_ptr is 40 nsec (vs 20 nsec passing shared_ptr by value..."_ that's strange, moving unique_ptr is 2 assignments, where as passing shared_ptr by value will involve a lock to update the reference count as well as creating the new shared_ptr.. — Richard Critten, Apr 07 '21 at 22:05
Please explain how you measured moving `unique_ptr` as taking twice as long as `shared_ptr`. There's no reason why, in an optimized build, moving an ordinary `unique_ptr` should be slower than passing a raw pointer. — alter_igel, Apr 07 '21 at 22:06
Why aren't you moving the `unique_ptr` into the lambda via capture? — alter_igel, Apr 07 '21 at 22:06
@RichardCritten: "*passing shared_ptr by value will involve a lock to update the reference count*" No `shared_ptr` uses locks for incrementing the reference count. Also, move-construction doesn't change reference counts. — Nicol Bolas, Apr 07 '21 at 22:12
@alter igel, sorry I have one more requirement, I need to be able to push the lambda to containers. if I capture unique_ptr, I can't push it because it requires it supports copy constructor which unique_ptr doesn't support. It is a well known limitation. — Kenneth, Apr 07 '21 at 22:13
@NicolBolas re the lock - making the reference count multi-thread safe; re move-construction - OP: _"... passing shared_ptr by value..."_ — Richard Critten, Apr 07 '21 at 22:14
@RichardCritten: Passing by value means you're passing a value, not necessarily copying one. You can move by value. And no, you only need atomic increments to make `shared_ptr` thread-safe. Which is what every implementation uses. They're not "locks". — Nicol Bolas, Apr 07 '21 at 22:15
@Kenneth: "*it requires it supports copy constructor*" What container imposes copy-constructible on its type? Not even `vector` requires that. Also, how do you store a lambda of all things in a `vector`? — Nicol Bolas, Apr 07 '21 at 22:16
@RichardCritten: You move into the function: `func_that_takes_by_value(std::move(some_value))`. — Nicol Bolas, Apr 07 '21 at 22:17
@alterigel you can look at https://github.com/kennthhz/testingground/blob/master/src/smart_ptr_example.cc for how I compare pass share_ptr, unique_ptr and raw. just use steady_clock and call a series of function with std::move(unique_ptr). — Kenneth, Apr 07 '21 at 22:17
@TedLyngmo yes I am soliciting opinions. I know this is not black and white thing. — Kenneth, Apr 07 '21 at 22:18
@NicolBolas the OP specified move for the unique_ptr and pass by value for the shared_ptr - I assumed he used different terms because he is passing the pointers by 2 different methods/techniques. — Richard Critten, Apr 07 '21 at 22:18
@RichardCritten: Then it's not a proper comparison if you're not doing the same thing on both sides. Equally importantly, there is *no way* that two atomic increments & copying two pointers and nulling two out is *faster* than copying a single pointer and nulling out the other one. So obviously the OP is not timing their code correctly. — Nicol Bolas, Apr 07 '21 at 22:20
@RichardCritten look at this link on limitation I mentioned https://stackoverflow.com/questions/25421346/how-to-create-an-stdfunction-from-a-move-capturing-lambda-expression — Kenneth, Apr 07 '21 at 22:21
It seems my question relies on my benchmark showing pass std::move(unique_ptr) through a series of function calls is more expensive than pass share_ptr by value and raw pointer. But several of you questions my benchmark data. My code is at https://github.com/kennthhz/testingground/blob/master/src/smart_ptr_example.cc I use the same approach for raw, share and unique_ptr. Did I miss anything obvious? See MeasurePassUniquePtr. — Kenneth, Apr 07 '21 at 22:29
@Kenneth [This is what I got](https://quick-bench.com/q/qJDZJaenJMp3084k6AT5m33UKgA) - In my test, `shared_ptr` is 120 times slower. Moving a `unique_ptr` is just a tad slower than passing the raw pointer. I must admit that I didn't look closely at the tests you made. Perhaps you can make something in quick-bench too? Otherwise you might get tricked by the optimizer removing parts of the tests if it can see that it doesn't have any observable effect. — Ted Lyngmo, Apr 08 '21 at 15:52

How to transfer unique_ptr ownership efficiently through many layers of calls to async scenario

0 Answers0