How to return large data efficiently in C++11

Question

I'm realy confused about returning large data in C++11. What is the most efficient way? Here is my related function:

void numericMethod1(vector<double>& solution,
                    const double input);

void numericMethod2(pair<vector<double>,vector<double>>& solution1,
                    vector<double>& solution2,
                    const double input1,
                    const double input2);

and here is the way i use them:

int main()
{
    // apply numericMethod1
    double input = 0;
    vector<double> solution;
    numericMethod1(solution, input);

    // apply numericMethod2
    double input1 = 1;
    double input2 = 2;
    pair<vector<double>,vector<double>> solution1;
    vector<double> solution2;
    numericMethod2(solution1, solution2, input1, input2);

    return 0;
}

The question is, is the std::move() useless in following implemtation?

Implementation:

void numericMethod1(vector<double>& solution,
                    const double input)
{
    vector<double> tmp_solution;

    for (...)
    {
    // some operation about tmp_solution
    // after that this vector become very large
    }

    solution = std::move(tmp_solution);
}

void numericMethod2(pair<vector<double>,vector<double>>& solution1,
                    vector<double>& solution2,
                    const double input1,
                    const double input2)
{
    vector<double> tmp_solution1_1;
    vector<double> tmp_solution1_2;
    vector<double> tmp_solution2;

    for (...)
    {
    // some operation about tmp_solution1_1, tmp_solution1_2 and tmp_solution2
    // after that the three vector become very large
    }

    solution1.first = std::move(tmp_solution1_1);
    solution1.second = std::move(tmp_solution1_2);
    solution2 = std::move(tmp_solution2);
}

If they are useless, how can i deal with these large return value without copy many times? Free to change the API!

UPDATE

Thanks to StackOverFlow and these answers, after diving into related questions, I know this problem better. Due to RVO, I change the API, and for more clear, I don't use std::pair anymore. Here, is my new code:

struct SolutionType
{
    vector<double> X;
    vector<double> Y;
};

SolutionType newNumericMethod(const double input1,
                              const double input2);

int main()
{
    // apply newNumericMethod
    double input1 = 1;
    double input2 = 2;
    SolutionType solution = newNumericMethod(input1, input2);

    return 0;
}

SolutionType newNumericMethod(const double input1,
                              const double input2);
{
    SolutionType tmp_solution; // this will call the default constructor, right?
    // since the name is too long, i make alias.
    vector<double> &x = tmp_solution.X;
    vector<double> &y = tmp_solution.Y;

    for (...)
    {
    // some operation about x and y
    // after that these two vectors become very large
    }

    return tmp_solution;
}

How can I know RVO is happened? or How can I ensure RVO happened?

These are not useless in this case, but why don't you use `solution1` and `solution2` directly? Or reference to them? — Holt, May 09 '16 at 14:09

Vittorio Romeo · Accepted Answer · 2016-05-09T14:26:00.247

9

Return by value, rely on RVO (return value optimization).

auto make_big_vector()
{
    vector<huge_thing> v1;
    // fill v1

    // explicit move is not necessary here        
    return v1;
} 

auto make_big_stuff_tuple()
{
    vector<double> v0;
    // fill v0

    vector<huge_thing> v1;
    // fill v1

    // explicit move is necessary for make_tuple's arguments,
    // as make_tuple uses perfect-forwarding:
    // http://en.cppreference.com/w/cpp/utility/tuple/make_tuple

    return std::make_tuple(std::move(v0), std::move(v1));
}

auto r0 = make_big_vector();
auto r1 = make_big_stuff_tuple();

I would change the API of your functions to simply return by value.

edited May 09 '16 at 14:26

answered May 09 '16 at 14:08

Vittorio Romeo

90,666
33
258
416

how can i know when it use the 'Return value optimization'? Oh, you just added the link. Thanks, I'll read it carefully. – Regis May 09 '16 at 14:12
@Reigs: it has been discussed many times on StackOverflow before, with many beautiful and informative answers - make sure to do some research here as well – Vittorio Romeo May 09 '16 at 14:15
You also can get copy elision or move elision. – NathanOliver May 09 '16 at 14:17
@LightnessRacesinOrbit: [seems like it](https://godbolt.org/g/8NJjVr) - the arguments to make_tuple need to be moved though, updating answer. – Vittorio Romeo May 09 '16 at 14:23
This is a good answer. If you don't return by value, you'll lose RVO. – erip May 09 '16 at 14:25
Thank you! i read the link, does that means, if i use std container or the compiler support, return by value is better return to a reference parameter. Because in numericMethod1, i dont need to do std::swap() or std::move() things. Am i right? – Regis May 09 '16 at 14:50
@Reigs: generally, yes - "return via out reference parameter" shouldn't be used in modern C++ code. Returning by value and relying on RVO should be your default choice. `std::move` may be necessary in situations where you need to pass your values to some helper function like `make_tuple`, but it's not necessary if you're directly returning a value from a function. – Vittorio Romeo May 09 '16 at 15:34
@VittorioRomeo: Return by value is most natural way. But,I'm wondering why some library function perfer 'return out via reference parameter'? return void is a kind of coding style or it has historical reason? OpenCV example: void cv::dilate(cv::Mat& input, cv::Mat& output);http://docs.opencv.org/2.4/modules/imgproc/doc/filtering.html#dilate – Regis May 09 '16 at 16:12
@Reigs: this solution works only for C++11 onwards. Move semantics were not present before C++11. OpenCV was released in 2000, and it probably still supports pre-C++11 codebases. – Vittorio Romeo May 09 '16 at 16:15
@VittorioRomeo: Got that! i need to add '-std=c++11' compile flag to enable this optimization. Really appreciate your answer and your patience. I learnt a lot! – Regis May 09 '16 at 16:22
2

@vitt return by reference has one big advantage: ease of reusing a buffer. Return by value requires work to enable that. – Yakk - Adam Nevraumont May 09 '16 at 17:31
How sure are you that RVO will actually occur in this scenario, and how efficient it will be? – Crashworks May 10 '16 at 01:35
@VittorioRomeo Could you please see my question again? I update a solution, and post a new question in the end. – Regis May 10 '16 at 01:49
@Reigs: I think it would be more appropriate to post a new question – Vittorio Romeo May 11 '16 at 15:30
@Yakk: good point. I would probably create a `fillX` function that takes a reference to an existing buffer, and a `createX` function that internally calls the fill function and returns by value. – Vittorio Romeo May 11 '16 at 15:32

101010 · Answer 2 · 2016-05-09T14:19:59.647

3

You could use std::vector::swap member function, which exchanges the contents of the container with those of other. Does not invoke any move, copy, or swap operations on individual elements.

solution1.first.swap(tmp_solution1_1);
solution1.second.swap(tmp_solution1_2);
solution2.swap(tmp_solution2);

edit:

These statements are not useless,

solution1.first = std::move(tmp_solution1_1);
solution1.second = std::move(tmp_solution1_2);
solution2 = std::move(tmp_solution2);

they envoke the move assignment operator of std::vector::operator=(&&), which indeed moves the vector in the right hand side.

edited May 09 '16 at 14:19

answered May 09 '16 at 14:05

101010

41,839
11
94
168

Thanks for your immediate replying, can you explain more please? The std::move() doesn't work after solution1.second constructed? – Regis May 09 '16 at 14:08

Mr.C64 · Answer 3 · 2016-05-09T14:45:55.900

2

When you have large data like a very big vector<double>, you can still return it by value, since C++11's move semantics will kick in for std::vector, so returning it from your function will just be some kind of pointer assignment (since vector<double>'s content is typically heap-allocated under the hood).

So I would just do:

// No worries in returning large vectors by value
std::vector<double> numericMethod1(const double input)
{
    std::vector<double> result;

    // Compute your vector<double>'s content
    ...

    // NOTE: Don't call std::move() here.
    // A simple return statement is just fine.
    return result;
}

(Note that other kind of optimizations already available in C++98/03 like RVO/NRVO can be applied as well, based on the particular C++ compiler.)

Instead, if you have a method that returns multiple output values, then I'd use non-const references, just like in C++98/03:

void numericMethod2(pair<vector<double>,vector<double>>& output1,
                    vector<double>& output2,
                    vector<double>& output3,
                    ...
                    const double input1,
                    const double input2);

Inside the implementation, you can still use a valid C++98/03 technique of "swap-timization", where you can just call std::swap() to swap local variables and output parameters:

#include <utility> // for std::swap

void numericMethod2(pair<vector<double>,vector<double>>& solution1,
                    vector<double>& solution2,
                    const double input1,
                    const double input2)

{
    vector<double> tmp_solution1_1;
    vector<double> tmp_solution1_2;
    vector<double> tmp_solution2;

    // Some processing to compute local solution vectors
    ...

    // Return output values to caller via swap-timization
    swap(solution1.first, tmp_solution1_1);
    swap(solution1.second, tmp_solution1_2);
    swap(solution2, tmp_solution2);
}

Swapping vectors typically swaps internal vector's pointers to the heap-allocated memory owned by the vectors: so you just have pointer assignments, not deep-copies, memory reallocations, or similar expensive operations.

edited May 09 '16 at 14:45

answered May 09 '16 at 14:18

Mr.C64

41,637
14
86
162

Thank you, now i know this problem better. lots of things to learn. – Regis May 09 '16 at 14:40
@Reigs: You're welcome. If you find some answers helpful, you may want to upvote them as a good StackOverflow citizen. – Mr.C64 May 09 '16 at 14:42
I'd say a method that returns multiple output values should wrap those values in a custom class, struct or tuple, and then just return an instance of that. – Christian Hackl May 09 '16 at 14:51
You are right! l'm going to get more reputation, such that i can voteup everyone who help me. – Regis May 09 '16 at 14:53
@ChristianHackl: I'm not sure, I mean: this to me is more in the kingdom of personal preferences... What you suggest would add a potential artificial layer of grouping between stuff that might be somewhat unrelated, except for being all _"output values"_. – Mr.C64 May 09 '16 at 14:55
"When you have large data like a very big vector, you can still return it by value, since C++11's move semantics will kick in [...]" That's incorrect. When you return by value, the value will be copied. At sufficient optimization level the compiler will elide the copy so that return by value comes to no cost. There will only be a move when you explicitly ask for a move. You could return an rvalue reference and use `return std::move(...)`. But that will be less efficient since it involves a move operation. Return by value with copy ellision has neither a move nor a copy operation. – TFM May 09 '16 at 15:04
@TFM: I believe you are wrong. If the compiler can apply NRVO or other optimizations great, else move semantics will be applied _without_ the explicit `std::move(...)` in the return statement. Anyway, I'm not a language lawyer, and I'm open to be proven wrong by some language lawyer with proper citation in the C++ standard documents. – Mr.C64 May 09 '16 at 15:08
@Mr.C64: As a matter of fact, you are right. My apologies. I'd still choose a different wording. To be able to efficiently return by value is not a C++11 feature and in general unrelated to move semantics. It's just that with C++11 there is yet another way to optimize – TFM May 09 '16 at 15:23
@TFM: No problem. With StackOverflow we can learn from each other. – Mr.C64 May 09 '16 at 15:30
@Mr.C64 Could you please see my question again? I update a solution, and post a new question in the end – Regis May 10 '16 at 01:52

score 1 · Answer 4 · answered May 09 '16 at 14:18

First of all, why dont you use the solution1 directly in numericMethod2? that is more direct.

Unlike the std::array or obj[], the value is not store in stack, but using heap ( you can refer to the standard library code, they use operator new() a lot ). so, if you find the vector is temporary only and will return to somewhere else, use std::swap or std::move. function return can actually be casted to xvalue

this is always true for standard container ( std::map, std::set, deque, list, etc )

How to return large data efficiently in C++11

4 Answers4

Linked