26

I have a standard library container of large numbers, so large that they may cause overflow if I add them together. Let's pretend it's this container:

std::vector<int> v = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

I want to calculate the mean of this container, using std::accumulate, but I can't add all the numbers together. I'll just calculate it with v[0]/v.size() + v[1]/v.size() + .... So I set:

auto lambda = ...;
std::cout << std::accumulate(v.begin(), v.end(), 0, lambda) << std::endl;

Here is what I have tried so far, where -> indicates the output:

lambda = [&](int a, int b){return (a + b)/v.size();};  ->  1
lambda = [&](int a, int b){return a/v.size() + b/v.size();};  ->  1
lambda = [&](int a, int b){return a/v.size() + b;};  ->  10

How can I produce the correct mean such that the output will be 5?

EMBLEM
  • 2,207
  • 4
  • 24
  • 32
  • 2
    `5` is not the correct answer. – Ben Voigt Apr 16 '15 at 20:31
  • @BenVoigt It is if you're using integer division. – EMBLEM Apr 16 '15 at 20:33
  • 3
    Integer division is not used in calculation of a mean. Combined with `std::accumulate`, it's even worse -- it will ruin your partial sums. If you want the final result rounded according to the rules of integer division, you should say that explicitly in your question (and then you are not finding a mean). Otherwise your use of integer division looks like a bug to every reader. – Ben Voigt Apr 16 '15 at 20:36

6 Answers6

27

You shouldn't use integer to store the result:

The return type passed to the function accumulate:
T accumulate( InputIt first, InputIt last, T init, BinaryOperation op ); depends on the third parameter type: (T init) so you have to put there: 0.0 to get result as double.

#include <vector>
#include <algorithm>
#include <iostream>
#include <numeric>
using namespace std;
std::vector<int> v = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

int main()
{
    auto lambda = [&](double a, double b){return a + b / v.size(); };
    std::cout << std::accumulate(v.begin(), v.end(), 0.0, lambda) << std::endl;
}
AdamF
  • 2,501
  • 17
  • 30
  • @rpattabi Could you elaborate a bit more how to reproduce the warning without return type ? I'm not able to do it. – AdamF Jan 21 '18 at 19:09
  • Isn't the second type of your lambda the type of the vector's elements? Meaning `auto lambda = [&](double a, int b){//...` – Chris_128 Feb 14 '21 at 07:40
  • It can't be changed to int, because then the division is incorrect. If you would like to pass it as int then you should cast it to double later: `auto lambda = [&](double a, int b) {return a + (double)b / v.size(); };` – AdamF Feb 16 '21 at 08:06
  • Good point. Please consider doing this (maybe with a `static_cast(b)` in your answer. I think that is better than the current answer because it clearly shows which of the lambda's arguments comes from the accumulation's carry-over and which one is the vector's element. Additionally, that explicitly shows that a cast is going on and doesn't cast that implicitly in the lambda's argument. – Chris_128 Feb 16 '21 at 19:52
  • If `v` is global, you do not need to capture anything in your lambda function. Actually your lambda isn't capturing anything in your example. – Axel Krypton Mar 30 '23 at 09:02
8

This may not round quite as nicely, but it works even when there's no size() method on the container:

auto lambda = [count = 0](double a, int b) mutable { return a + (b-a)/++count; };

This takes advantage of a new C++14 features, initialized captures, to store state within the lambda. (You can do the same thing via capture of an extra local variable, but then its scope is local scope, rather than lifetime of the lambda.) For older C++ versions, you naturally can just put the count in the member variable of a struct and put the lambda body as its operator()() implementation.

To prevent accumulation of rounding error (or at least dramatically reduce it), one can do something like:

auto lambda = [count = 0, error = 0.0](double a, int b) mutable {
   const double desired_change = (b-a-error)/++count;
   const double newa = a + (desired_change + error);
   const double actual_change = newa - a;
   error += desired_change - actual_change;
   return newa;
};
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • It's good to use this approximation formula for large data set, because the quality of double might not be enough in the original formula. – AdamF Apr 16 '15 at 20:41
  • @AdamF: One can keep track of an error term as well, to prevent rounding errors from accumulating. – Ben Voigt Apr 16 '15 at 20:42
  • Perfect. In the previous comment I also wanted to admit your first formula : ) I used it few times, it's powerful and we even don't need to keep this whole array in memory. – AdamF Apr 16 '15 at 21:01
  • @BenVoigt nice answer! You should mention that lambda capture expressions work only in C++14 – vsoftco Apr 16 '15 at 22:47
  • @vsoftco: I had been intending to do that, then got interested in the rounding-error. Thanks for reminding me. – Ben Voigt Apr 16 '15 at 22:57
  • Using an error term is roughly like using twice the precision. Another way to reduce the error is to reduce the depth of the expression tree. `accumulate` uses a left fold, which is the worst possible case (linear depth) – Arne Vogel Jun 20 '17 at 18:10
1

Your running "average" is the first parameter to the lambda, so the following is correct.

lambda = [&](int a, int b){return a + b/v.size();};
Ken Bloom
  • 57,498
  • 14
  • 111
  • 168
0

The three lambda functions you are using are not upto the mark.

lambda = [&](int a, int b){return (a + b)/v.size();};  ->  1
lambda = [&](int a, int b){return a/v.size() + b/v.size();};  ->  1
lambda = [&](int a, int b){return a/v.size() + b;};  ->  10

The parameter a used here carries the mean upto the particular index of vector at a given point of time.For instance the value of 'a' when the value of 'b' is 1 is 0.0, when 'b' becomes 2 at that instant it should be '0.1'. Then it is quite clear that in no case 'a' needs to be divided by v.size() everytime the lambda function is called.

Coming to the correct lambda function for the mentioned situation

lambda = [&](double x,double y){return x+y/v.size();}

Here we capture by reference just because we need value of v.size(),value of size of vector can be passed beforehand if known beforehand

The working program is

    #include<iostream>
    #include<numeric>
    #include<vector>
    using namespace std;

    int main(){
        vector<int> v(10);
        iota(v.begin(),v.end(),1);
        double x=accumulate(v.begin(),v.end(),0.0,[&](double x,double y) {return x+y/v.size();});
        cout << x << endl;
    }

P.S : 'iota' is used to initialise a range in an increasing fashion,Here it initialise the vector from 1 to 10

Pyder
  • 21
  • 4
0

I haven't seen this solution which doesn't bother with passing the vector's size since one's already controlling the range with v.begin(), v.end():

double mean = accumulate(v.begin(), v.end(), 0., [](double x, double y) { return x+y; }) / v.size();

It can be further improved with replacing v.size() by std::distance(start,end).

Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
slrx
  • 1
  • 1
0

Obviously you don't need to go to these lengths but you can create a simple "StatsCalculator"

struct StatsCalculator
{
   size_t count;
   double sum;
   double sumSq;
   
   double mean() const { return count ? sum/count : NaN(); }
   double variance() const { return count ? (sumSq-sum*sum/count)/count : NaN();  }
   std::tuple<double,double> meanAndVariance() { return { mean(), variance() }; 
   void addValue( double val ) { ++count; sum += val; sumSq += val*val; }
};

So how about your lambda. Create the StatsCalculator instance before you iterate. Then

auto myLambda = [](StatsCalculator* calculator, int value)
   { calculator->addValue(static_cast<double>(value)); return calculator; }

Then to iterate:

StatsCalculator calc;
double mean = std::accumulate(y.begin(), y.end(), &calc, myLambda)->mean();

and of course you could ask for a tuple of mean and variance.

CashCow
  • 30,981
  • 5
  • 61
  • 92