47

Why is the behavior of the Haskell range notation different for floats than for integers and chars?

Prelude> [1, 3 .. 10] :: [Int]
[1,3,5,7,9] 
Prelude> [1, 3 .. 10] :: [Float]
[1.0,3.0,5.0,7.0,9.0,11.0]
Prelude> ['a', 'c' .. 'f']
"ace"

I would understand it if the last element was close to the upper bound, but this is obviously not a rounding issue.

undur_gongor
  • 15,657
  • 5
  • 63
  • 75

2 Answers2

41

The syntax [e1, e2 .. e3] is really syntactic sugar for enumFromThenTo e1 e2 e3, which is a function in the Enum typeclass.

The Haskell standard defines its semantics as follows:

For the types Int and Integer, the enumeration functions have the following meaning:

  • The sequence enumFrom e1 is the list [e1,e1 + 1,e1 + 2,…].
  • The sequence enumFromThen e1 e2 is the list [e1,e1 + i,e1 + 2i,…], where the increment, i, is e2 − e1. The increment may be zero or negative. If the increment is zero, all the list elements are the same.
  • The sequence enumFromTo e1 e3 is the list [e1,e1 + 1,e1 + 2,…e3]. The list is empty if e1 > e3.
  • The sequence enumFromThenTo e1 e2 e3 is the list [e1,e1 + i,e1 + 2i,…e3], where the increment, i, is e2 − e1. If the increment is positive or zero, the list terminates when the next element would be greater than e3; the list is empty if e1 > e3. If the increment is negative, the list terminates when the next element would be less than e3; the list is empty if e1 < e3.

This is pretty much what you'd expect, but the Float and Double instances are defined differently:

For Float and Double, the semantics of the enumFrom family is given by the rules for Int above, except that the list terminates when the elements become greater than e3 + i∕2 for positive increment i, or when they become less than e3 + i∕2 for negative i.

I'm not really sure what the justification for this is, so the only answer I can give you is that it is that way because it's defined that way in the standard.

You can work around this by enumerating using integers and converting to Float afterward.

Prelude> map fromIntegral [1, 3 .. 10] :: [Float]
[1.0,3.0,5.0,7.0,9.0]
hammar
  • 138,522
  • 17
  • 304
  • 385
  • 13
    Probably the intention is to make things such as `[0.0, 0.1 .. 1.0]` work more-or-less like what one would naively expect not knowing about floating-point imprecisions. – hmakholm left over Monica Sep 03 '11 at 01:21
  • 6
    @Henning: I guess that makes sense if one assumes that in the common use case, `e3` would be in the sequence `[e1, e1+i ..]`, ignoring inaccuracies. In the OP's example, `e3` was halfway in between two values, which is in a sense the worst case scenario for that assumption. – hammar Sep 03 '11 at 01:39
  • 4
    I would further contend that anyone who doesn't know about floating-point imprecision issues should *not be using floating point values*. In general, use arbitrary-precision rationals where performance isn't a major concern, and learn how to use floats properly where it is. – C. A. McCann Sep 03 '11 at 02:57
  • 2
    Agreed. Imprecision comes up way too often on SO questions. There is clearly a failing somewhere in developer education (be it professional or hobbyist). – Thomas M. DuBuisson Sep 03 '11 at 03:44
  • 8
    This is a nasty bug in the Haskell specification. It's trying to hide the inherent problems with floating point, but that just means the problem shows up elsewhere. – augustss Sep 03 '11 at 07:29
  • 1
    @augustss: even if it was nasty, it would not be a _bug_. – leftaroundabout Sep 03 '11 at 22:20
  • @leftaroundabout: Treating floating point values as if they were sensible, well-behaved numbers is fundamentally a mistake. They're not numbers and it's impossible to pretend as if they are without having it come back to bite you. Putting things into the language specification that try to do exactly that will of course lead to undesired behavior, which meets most definitions of a bug. – C. A. McCann Sep 03 '11 at 22:44
  • @Thomas M. DuBuisson: To my mind, the failing is giving anyone the impression that using floating point values should ever be the *default*, instead of a choice you should be able to justify. An egregious example here is that Haskell's type defaulting picks `Double` for unspecified fractional types instead of `Rational`. – C. A. McCann Sep 03 '11 at 22:49
  • 2
    @leftaroundabout Even if it's the specification it can be a bug. This is not how it used to be defined in Haskell, and the earlier one got it right. The current behaviour is very counter intuitive. – augustss Sep 04 '11 at 07:47
  • @C. A. McCann: it's exactly the other way around. The intuitive, I'd call it the _naïve_ definition of float ranges would treat them as if they were well-defined. The current definition accounts for the uncertainties, not by pretending they aren't there but by making sure you get the best (and most importantly, least _biased_) possible results in spite of the calculation errors. – leftaroundabout Sep 04 '11 at 12:35
  • @augustss: yes, the behaviour is counterintuitive. But the behaviour of floating point numbers is counterintuitive itself, so it's no good to treat them in intuitive ways. If you want to prevent all counterintuitive things, you must not use floats – but that's a high price to pay, as they are by far the fastest type of non-integer numbers. – leftaroundabout Sep 04 '11 at 12:41
  • 5
    @leftaroundabout I know how floating point works, and attempts to work around problems is just causing me more problems. I expect `enumFromThenTo` to keep adding the increment and taking elements as long as they are <= the upper bound. Getting an element above the upper bound is very confusing. – augustss Sep 05 '11 at 10:21
  • 1
    @augustss: you say getting an element above the upper bound is very confusing. If that is so you're relying on a float to fulfill an exact requirement. But floats cannot be trusted to fulfill exact requirements: if it's ok to have a value close to or **on** the upper bound, then a _single_ value a short way past the border should be ok, too. (This wouldn't apply to open intervals, but finite Haskell ranges represent closed intervals, hence the ***on***.) — Worse, on the other hand, would be a statistical bias of the _average_, and that's what you get with the naïve/intuitive definition. – leftaroundabout Sep 05 '11 at 11:14
  • 3
    @leftaroundabout If you ask people what they intuitively expect from the list `[x .. y]` I think you'll find they they expect each elemenet to be >=x and <= y. The easy is to achieve, and it used to be this way for floating point. But then it was "improved". The extra element is also not "close" the upper bound; it can be half a step away. – augustss Sep 05 '11 at 15:16
  • 4
    @augustss: well maybe you should give some examples of applications where the intuitive definition actually leads to useful results. As far as I can see, all applications where the possible overshoot of the last value (be it expected or not) might be an actual problem are not safe use of floating point anyway; however it's possible that I've missed something. As for applications where the current definition leads straightforwardly to the best possible results while the intuitive definition leads to unpredictable statistically signficant errors, see my answer below (there are more examples). – leftaroundabout Sep 05 '11 at 21:27
13

Ok, @Henning Makholm already said this in his comment, but he didn't explain why this actually is a better solution.

First thing to say: when dealing with floating-point, we must always be aware of the possible rounding errors. When we write [0.0, 0.1 .. 1.0] we must be aware that all these numbers, except for the first one, will not be at the exact places of tenths. Where we need this kind of certainty, we must not use floats at all.

But of course there are many applications where we're content with reasonable certainy, but need high-speed. That's where floats are great. One possible application of such a list would be a simple trapezoid numerical integration:

trIntegrate f l r s = sum [ f x | x<-[l,(l+s)..r] ] * s - (f(l)+f(r))*s/2

let's test this: trIntegrate ( \x -> exp(x + cos(sqrt(x) - x*x)) ) 1.0 3.0 0.1 => 25.797334337026466
compared to 25.9144 an error of less than one percent. Not exact of course, but that's inherent to the integration method.

Suppose now that float ranges were defined to always terminate when crossing the right border. Then, it would be possible (but we can't be certain about it!) that only 20 values rather than 21 are calculated in the sum, because the last value of x happens to be 3.000000something. We can simulate this

bad_trIntegrate f l r s = sum [ f x | x<-[l,(l+s)..(r-s)] ] * s - (f(l)+f(r))*s/2

then we get

bad_trIntegrate ( \x -> exp(x + cos(sqrt(x) - x*x)) ) 1.0 3.0 0.1

=> 21.27550564546988
urgh!

This has nothing to do with hiding the problems with floating point. It's just a method to help the programmer getting around these problems easier. In fact, the counterintuitive result of [1, 3 .. 10] :: Float helps to remember these problems!

leftaroundabout
  • 117,950
  • 5
  • 174
  • 319
  • But why was the same behaviour specified for Rationals? Here there are no imprecisions. – Josephine Feb 14 '12 at 12:03
  • @André that is a bit uncanny indeed. It's a choice between whether the behaviour is incompatible with integers or with floats. IMO, rationals are rather closer related to floats, so it's somewhat reasonable to put them in that box rather than the `Integral` one. — In an interval of fractional numbers (as opposed to an integer one) it is always possible to choose a step size of which the interval size is an integral multiple. In that case, ranges of `Rational`s and ranges of `Float`s behave just as one would intuively expect, so we should simply make sure this condition is always fulfilled. – leftaroundabout Feb 14 '12 at 14:03
  • 2
    After having thought about it, i agree about the interval size. Someone on IRC even considered it an error, if this condition was not fulfilled. At first i thought that is ridiculous, but now i see his point. – Josephine Feb 15 '12 at 14:38
  • I agree that simplifies life in many cases. However, it introduces new problems: `sum [dx * sqrt (1 - x) | x <- [0.0, dx .. 1]]` returns `NaN` for `dx = 0.001` because the "overshoot" leaves the domain of the integrated function. I'm not sure this is worth it. – undur_gongor Apr 26 '18 at 19:26
  • @undur_gongor such a sum would normally also give a wrong result when the last term is omitted; only in special cases like `sqrt (1 - 1)` does this not matter. Again I'd argue that the _obviously wrong_ NaN is less bad than a sensible-sounding yet order-0-inaccurate float number. — To be really sure for stuff like this, using floats that may touch the domain boundaries is just fundamentally the wrong approach; one should calculate _cell midpoints_ instead or else only use range syntax for the inter-cell boundaries but add the exact outer boundaries separately. – leftaroundabout Apr 26 '18 at 19:32