5

I have a function that gets a sample (an std::vector<double>) as input and computes the average of the sample: what is the best way to handle the empty input vector case?

My first idea is to throw an exception like in this snippet:

double average(const std::vector<double>& sample)
{
   size_t sz = sample.size();
   if (sz==0) throw std::exception("unexpected empty vector");

   double acc = 0;
   for (size_t i=0; i<sz; ++i) acc += sample[i];
   return acc/sz;
}

But I think another solution could be to return NaN:

double average(const std::vector<double>& sample)
{
   size_t sz = sample.size();
   if (sz==0) return std::numeric_limits<double>::quiet_NaN();

   double acc = 0;
   for (size_t i=0; i<sz; ++i) acc += sample[i];
   return acc/sz;
}

I like the exception because it shows where the problem happened while if I get a NaN in a final result of a long computation I will have more difficulties to understand where the NaN was born. Anyway with the NaN I like the possibility of returning a "special" double to signal something unexpected happened.

Is there any other way of cope with the empty vector? Thank you.

Alessandro Jacopson
  • 18,047
  • 15
  • 98
  • 153
  • 4
    2 tips: pass the vector by reference (&) and use Kahan summation. – Yakov Galka Sep 03 '11 at 13:33
  • @ybungalobill +1 for the first tip (I forgot the & I usually write). For the second tip I have a question: have you ever seen real code misbehaving and fix the problem with the Kahan summation? – Alessandro Jacopson Sep 03 '11 at 14:04
  • define 'misbehaving'. Floating point calculation do not 'misbehave' (usually), they just gradually loose precision. Yes, it's common to sum 100 numbers and get a result far enough from the infinite-precision one so that it's seen in the output. – Yakov Galka Sep 03 '11 at 14:15
  • I am curious about direct experience (or "War Story") related to numerical analysis, did it happen to code you were working on? How much "far enough" was? – Alessandro Jacopson Sep 03 '11 at 14:28
  • @uvts_cvs: Ask this as a new question (with a floating point tag). – Martin York Sep 03 '11 at 17:48
  • @uvts: well I exaggerate. Although you can get 2 decimal places error while summing 100 numbers it's not so common. Here is a real example, summing 1/n^2 with 32-bit precision. Without Kahan it converges to the wrong limit 1.6447253 near the 4200th term, with Kahan it gets to 1.6449339 on the 3600000th iteration. The real value is 1.644934066... – Yakov Galka Sep 03 '11 at 18:30
  • @Tux-D the question has already been asked here http://stackoverflow.com/questions/4940072/kahan-summation but IMHO without success... – Alessandro Jacopson Sep 06 '11 at 13:55
  • @ybungalobill Instead of Kahan, what about `std::accumulate( sample.begin(), sample.end(), 0.0 );` ? – Alessandro Jacopson Sep 06 '11 at 14:33
  • @uvts: well, it will be definitely better than the explicit loop you wrote, but it does exactly the same thing. – Yakov Galka Sep 06 '11 at 14:54
  • @ybungalobill You're right, I checked the C++ standard and it seems to me that doing Kahan would be non compliant. – Alessandro Jacopson Sep 07 '11 at 06:37

4 Answers4

5

I DO think that mathematically the NaN would be more correct. In the end it's 0.0/0. Had it been a direct division, what would have happened?

Be aware that about C++ and exceptions there are holy wars. For example read this: To throw or not to throw exceptions?

Community
  • 1
  • 1
xanatos
  • 109,618
  • 12
  • 197
  • 280
  • I don't see a war here. The top two answers the only ones with votes both agree. Exceptions are fine (it all depends on the situation and usage). – Martin York Sep 03 '11 at 17:37
  • @Tux-D you should look at the page linked by OP. And in the end C++ isn't exactly "exception friendly" or "exception uniform" in its libraries and in the language. Sadly it's an added feature. – xanatos Sep 03 '11 at 17:40
2

I would leave the behavior undefined.

Just code it for the non-empty case and let the caller think about using it correctly. Probably you won't call it for empty vectors anyway as the check for the empty input will probably be done earlier.

Yakov Galka
  • 70,775
  • 16
  • 139
  • 220
1

Your understanding for using exceptions is correct and you should go ahead with that approach. Exceptions are meant for this purpose (throw when exceptional condition happens).

In this case, suppose if you return NaN, then every time you call the function average() you have to make sure that, you are putting an extra check which takes care of NaN scenario.

[Note: On top of that, make sure that the condition (sz == 0) is not a very frequently happening scenario. IMO, I won't use exceptions if they are thrown frequently.]

iammilind
  • 68,093
  • 33
  • 169
  • 336
  • 2
    +1 I agree not using exceptions if they are thrown frequently – Alessandro Jacopson Sep 03 '11 at 13:35
  • 1
    "every time you call the function average() you have to make sure that, you are putting an extra check which takes care of NaN scenario" - most likely the caller will ensure the input vector is non-empty (which might not require any code - in a lot of cases it will be known to be true), therefore not having to check for the NaN. – Steve Jessop Sep 03 '11 at 13:47
1

I would throw, just to keep on the safe side and make sure the error is detected immediately and not, as you said, after a while.

in fact in this specific use case you can return 0 if it make sense to say 0 is the average of "nothing".

what we do usually is to validate parameters as soon as we get inside the method and in case we throw ArgumentNullException or OutOfRangeException if really the method was designed to only be called with non null and properly filled arguments.

Davide Piras
  • 43,984
  • 10
  • 98
  • 147