4

Consider the following code:

import Text.Parsec
import Text.Parsec.Language
import Text.Parsec.String
import qualified Text.Parsec.Token as Token

float :: Parser Double
float = Token.float (Token.makeTokenParser emptyDef)

myTest :: String -> Either ParseError Double
myTest = parse float ""

Now, thanks to QuickCheck I know a magic number (I have aligned result for convenience):

λ> myTest "4.23808622486133"
Right      4.2380862248613305

Some floating point numbers cannot be exactly represented in memory, some operations easily introduce «fluctuations» into floating point numbers. We all know that. However, cause of this parsing problem seems to be different.

A few words about tests that helped me discover this… feature. Put simply, in these tests floating point value is generated, printed, and parsed back (with Parsec). For example, number 9.2 is known to be impossible to represent as floating point value, however it passes the tests (obviously because of «smart» printing function). Why does 4.23808622486133 fail?


For those who believe that these numbers are the same and 4.23808622486133 is just shortest unambiguous representation of 4.2380862248613305:

a1 :: Double
a1 = 9.2000000000000003

a2 :: Double
a2 = 9.200000000000001

b1 :: Double
b1 = 4.23808622486133

b2 :: Double
b2 = 4.2380862248613305

Now:

λ> a1 == a2
True
λ> b1 == b2
False
Community
  • 1
  • 1
Mark Karpov
  • 7,499
  • 2
  • 27
  • 62
  • 1
    Some people will disagree now again, but I'll keep saying: when you're using floating-point, you should _never expect anything to be exactly-equal_. – leftaroundabout Apr 23 '15 at 10:43
  • 1
    @leftaroundabout, absolutely correct. However it's interesting why this sort of thing happens in parsing. I mean, suppose there is a number `x` that cannot be represented exactly as floating point number, but when it's printed, let's say there is `x'` number that is actually printed. When you parse it back, you should get `x` again. I expect that there is some kind of consistency in that... – Mark Karpov Apr 23 '15 at 10:47
  • I regret having written my answer. Could you unaccept it please? The other one seems more useful to Haskell programmers. – Pascal Cuoq Apr 24 '15 at 18:16
  • @PascalCuoq, I've seen your discussion with DanielWagner and I understand why you want me to unaccept the answer. However, I think your answer is great, because it provides firm evidence that that bug does exist. This bug is in a third party library, Parsec, not in GHC. Theoretical part of your answer is also enlightening. I think your answer should not be unaccepted because of one downvote. Thank you again for your time and effort. – Mark Karpov Apr 25 '15 at 13:28

2 Answers2

3

Parsec does the conversion to Double using what amounts to

foldr (\d acc -> read [d] + acc / 10) 0 "423808622486133" :: Double

and as you point out, this is not equal to

423808622486133 / 100000000000000 :: Double

I agree that this should be considered a bug in Parsec.

Reid Barton
  • 14,951
  • 3
  • 39
  • 49
1

This is still not fixed in Parsec. If this exact problem breaks your day, take a look at Megaparsec, which is a fork of Parsec that fixes many bugs and conceptual flaws, improves quality of error messages and more.

As you can see this problem is fixed there:

λ> parseTest float "4.23808622486133"
4.23808622486133
λ> parseTest float "4.2380862248613305"
4.2380862248613305

Disclosure: I'm one of the authors of Megaparsec.

Mark Karpov
  • 7,499
  • 2
  • 27
  • 62
  • When discussing your own projects/packages on SO, you are expected to disclose your connection to them. – dfeuer Sep 25 '15 at 16:36