How to calculate the standard deviation with iterators and lambda functions

Question

After learning that one can calculate the mean of data, which is stored in a std::vector< std::vector<double> > data, can be done the following way:

void calculate_mean(std::vector<std::vector<double>>::iterator dataBegin,
                    std::vector<std::vector<double>>::iterator dataEnd, 
                    std::vector<double>& rowmeans) {
    auto Mean = [](std::vector<double> const& vec) {
                    return std::accumulate(vec.begin(), vec.end(), 0.0) / vec.size(); };
    std::transform(dataBegin, dataEnd, rowmeans.begin(), Mean);
}

I made a function which takes the begin and the end of the iterator of the data vector to calculate the mean and std::vector<double> is where I store the result. My first question is, how to handle the return value of function, when working with vectors. I mean in this case I make an Alias and modify in this way the vector I initialized before calling this function, so there is no copying back which is nice. So is this good programming practice?

Second my main questions is, how to adapt this function so one can calculate the standard deviation of each row in a similar way. I tried really hard but it only gives a huge mess, where nothing is working properly. So if someone sees it right away how to do that, I would be glad, for a insight. Thank you.

Edit: Solution

So here is my solution for the problem. Given a std::vector< vector<double> > data (rows, std::vector<double>(columns)), where the data is stored in the rows. The following function calculates the sample standard deviation of each row simultaneously.

auto begin = data.begin();
auto end = data.end();
std::vector<double> std;
std.resize(data.size());

void calculate_std(std::vector<std::vector<double>>::iterator dataBegin,
                   std::vector<std::vector<double>>::iterator dataEnd,
                   std::vector<double>& rowstds){

    auto test = [](std::vector<double> const& vec) {
                    double sum = std::accumulate(vec.begin(), vec.end(), 0.0);
                    double mean = sum / vec.size(); 
                    double stdSum = 0.0;
                    auto Std = [&](const double x) { stdSum += (x - mean) * (x - mean); }; 
                    std::for_each(vec.begin(), vec.end(), Std);
                    return sqrt(stdSum / (vec.size() - 1)); 
    };
    std::transform(dataBegin, dataEnd, rowstds.begin(), test);

}

I tested it and it works just fine. So if anyone has some suggestions for improvement, please let me know. And is this piece of code good performance wise?

For the second part - "I tried really hard but it only gives a huge mess, where nothing is working properly." - show us what you have tried. — T.C., Feb 20 '15 at 17:32
@T.C I edited with one of my tries to figure out how it could work. It is really a mess, therefore I thought I would rather not to post it. — nerdizzle, Feb 20 '15 at 18:43
I'm voting to close this question as off-topic because it is about code review and belongs on codereview.stackexchange.com — pmr, Feb 21 '15 at 15:00

Teddy Engel · Answer 1 · 2015-02-20T17:33:05.250

0

You will find relatively often the convention to write functions with input parameters first, followed by input / output parameters. Output parameters (that you write to with the return values of your function) are often a pointer to the data, or a reference. So your solution seems perfect, from that point of view.

Source: Google's C++ coding conventions

edited Feb 20 '15 at 17:33

answered Feb 20 '15 at 17:26

Teddy Engel

996
6
17

score 0 · Answer 2 · edited May 23 '17 at 10:24

I mean in this case I make an Alias and modify in this way the vector I initialized before calling this function, so there is no copying back which is nice. So is this good programming practice?

No, you should use a local vector<double> variable and return by value. Any compiler worth using would optimize away the copying/moving, and any conforming C++11 compiler is required to perform a move if for whatever reason it cannot elide the copy/move altogether.

Your code as written imposes additional requirements on the caller that are not obvious. For instance, rowmeans must contain enough elements to store the means, or undefined behavior results.

How to calculate the standard deviation with iterators and lambda functions

2 Answers2