38

Disclaimer: No, I didn't find any obvious answer, contrary to what I expected!

When looking for code examples wrt. the arithmetic mean, the first several examples I can turn up via Google seem to be defined such that the empty sequence generates a mean value of 0.0. (eg. here and here ...)

Looking at Wikipedia however, the Arithmetic mean is defined such that an empty sequence would yield 0.0 / 0 --

 A = 1/n ∑[i=1 -> n](a[i])

-- so, possibly, that is NaN in the general case.

So if I write a utility function that calculates the arithmetic mean of a set of floating point values, should I, in the general case:

  • return 0. for the empty sequence?
  • return (Q)NaN for the empty sequence?
  • "throw an exception" in case of empty sequence?
Community
  • 1
  • 1
Martin Ba
  • 37,187
  • 33
  • 183
  • 337
  • 3
    FYI numpy returns `nan` for this: `np.mean(np.array([]))` so it would seem that wikipedia looks correct – EdChum Sep 26 '16 at 15:32
  • 8
    I would say it's undefined, as it's 0/0. Whether you want to throw an exception or not, is more a matter of the conventions in the environment where you're coding it. – Ami Tavory Sep 26 '16 at 15:32
  • 2
    @EdChum: numpy analyses it correctly. – Bathsheba Sep 26 '16 at 15:33
  • @Bathsheba I think so too, in terms of mathematics this just makes sense to me, the exception side is an implementation detail – EdChum Sep 26 '16 at 15:35
  • @EdChum, But do note that integral division by zero is undefined behaviour in C++. – Bathsheba Sep 26 '16 at 15:40
  • @Bathsheba true, but the OP did state their function is for mean of floating point values rather than ints – EdChum Sep 26 '16 at 15:42
  • @EdChum: For once, my answer turns out to be comprehensive ;-) – Bathsheba Sep 26 '16 at 15:43
  • My experience (admittedly with Java) is that I would much rather have a method throw an exception than return NaN. If you get a nice stack trace the exception immediately pinpoints where you have called a method with a length 0 array argument, whereas NaNs can cause trouble far away from the method that is producing them - and very bizarre trouble too because they don't behave like other numbers when you compare them - http://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-comparisons-returning-false-for-ieee754-nan-values. – mcdowella Sep 27 '16 at 04:12
  • Pedantic math point: Re the exception to throw "Division by zero" isn't really correct as the arithmetic mean is undefined for empty sequence (so we don't get as far as seeing if 1/n is okay). From a programming point of view, I don't think this is a very clear exception either. – Nathan Cooper Sep 27 '16 at 09:35
  • In a 'real' OO world, you'd have a dedicated return type encapsulating a numerical value and/or the state of that value, i.e. if it is defined or not; the state field would have a meaningful name, like `bool mathematicallyDefined` &c. This way, no information is lost, the user can check the status of the result and act according to his needs, plus it's trivial to wrap that result into another object to return the desired value/behavior, i.e. `0.0`, exception, `NaN`, `-1`, or whatever. - The issue is, btw, about the same as the question about if/how `null` values should be avoided. – JimmyB Sep 27 '16 at 11:58

5 Answers5

37

There isn't an obvious answer because the handling depends on how you want to inform calling code of the error. (Or even if you want to interpret this as an "error".)

Some libraries/programs really don't like raising exceptions, so do everything with signal values. In that case, returning NaN (because the value of the expression is technically undefined) is a reasonable choice.

You might also want to return NaN if you want to "silently" bring the value forward through multiple other calculations. (Relying on the behavior that NaN combined with anything else is "silently" NaN.)

But note that if you return NaN for the mean of an empty sequence, you impose the burden on calling code that they need to check the return value of the function to make sure that it isn't NaN - either immediately upon return or later on. This is a requirement that is easy to miss, depending on how fastidious you are in checking return values.

Because of this, other libraries/programs take the viewpoint that error conditions should be "noisy" - if you passed an empty sequence to a function that's finding the mean of the sequence, then you've obviously doing something majorly wrong, and it should be made abundantly clear to you that you've messed up.

Of course, if exceptions can be raised, they need to handled, but you can do that at a higher level, potentially centralized at the point where it makes more sense to. Depending on your program, this may be easier or more along the lines of your standard error handling scheme than double checking return values.

Other people would argue that your functions should be robust to the error. For maximum robustness, you probably shouldn't use either NaN or an exception - you need to choose an actual number which "makes sense" as a value for the average of an empty list.

Which value is going to be highly specific to your use case. For example, if your sequence is a list of differences/errors, you might to return 0. If you're averaging test scores (scored 0-100), you might want to return 100 for an empty list ... or 0, depending on what your philosophy of the "starting" score is. It all depends on what the return value is going to be used for.

Given that the value of this "neutral" value is going to be highly variable based on exact use case, you might want to actually implement it in two functions - one general function which returns NaN or raises an exception, and another that wraps the general function and recognizes the 'error' case. This way you can have multiple versions, each with a different "default" case. -- or if this is something you're doing a lot of, you might even have the "default" value be a parameter you can pass.

Again, there isn't a single answer to this question: the average of an empty sequence is undefined. How you want to handle it depends intimately on what the result of the calculation is being used for: Just display, or further calculation? Should an empty list be exceptional, or should it be handled quietly? Do you want to handle the special case at the point in time it occurs, or do you want to hoist/defer the error handling?

R.M.
  • 3,461
  • 1
  • 21
  • 41
28

Mathematically, it's undefined as the denominator is zero.

Because the behaviour of integer division by zero is undefined in C++, throw an exception if you're working in integral types.

If you're working in IEEE754 floating point, then return NaN since the numerator will also be zero. (+Inf would be returned if the numerator is positive, and -Inf if the numerator is negative).

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
15

I suggest to keep the same behavior as for a 0.0 by 0 division, whatever it is. Indeed, one can adopt the as-if rule. This way you remain coherent with other operations and you don't have to make the decision yourself.

(You could even implement it as such, by returning 0.0/0, but the compiler might optimize this in unexpected ways.)

  • 5
    Upvoted. There is no naivety in returning `0.0 / 0` at all. The compiler is not allowed to optimise this in any expected way. It *must* evaluate this as `0.0 / 0.0`. In doing do you achieve *consistency* with the way your platform implements floating point poles. I believe this answer is superior to mine. – Bathsheba Sep 27 '16 at 07:57
2

I like defensive coding, so I would throw an exception. You can make it either a specific exception (like empty_sequence_exception) or a division by 0, since the divider is the length of the sequence which is 0.

0.0 is debatable since there is no data (sequence).

Michel Keijzers
  • 15,025
  • 28
  • 93
  • 119
  • It is very customary to define the combination of no elements to be the neutral (0 for sum, 1 for product, +/-infinity for min/max). –  Sep 26 '16 at 16:58
  • 3
    @Yves ... and for mean there is no neutral. – Ben Voigt Sep 26 '16 at 19:00
  • @BenVoigt: right, I advocate 0.0 for the sum. (And in my answer, 0.0/0 for the mean.) –  Sep 26 '16 at 19:09
  • Depending on your use case, 0.0 might be a reasonable return value, but if you're coding a generic `arithmetic_mean()` function, an exception or NaN return makes sense. – Mark Sep 26 '16 at 19:27
  • 2
    The sum of the elements of the empty set is zero: this follows from set theory axioms. `0.0` for the sum is therefore correct. – Bathsheba Sep 27 '16 at 07:59
1

The correct answer is that the arithmetic mean of an empty sequence has no meaning, since an empty sequence is essentially an empty set. Division of nothing is meaningless. Zero is certainly not a correct answer. Say a sequence has 3 members, 1, 0 and -1, or is a sequence of all zeros. The mean of both of these would be zero, and should not be confused with an empty sequence.

A Hoffman
  • 121
  • 2
  • In maths that's right. In programming, we say "not a number", which most systems allow. 0/0 = NaN. I don't understand the downvote, however. – Malcolm McLean Oct 10 '16 at 09:39