23

In R I am finding some odd behaviour that I can't explain and I am hoping someone here can. I believe that the value of 100! is this big number.

A few lines from the console showing expected behaviour...

>factorial( 10 )
[1] 3628800
>prod( 1:10 )
[1] 3628800
> prod( as.double(1:10) )
[1] 3628800
> cumprod( 1:10 )
[1]       1       2       6      24     120     720    5040   40320  362880 3628800

However when I try 100! I get (notice how the resulting numbers begin to differ about 14 digits in):

> options(scipen=200) #set so the whole number shows in the output
> factorial(100)
[1] 93326215443942248650123855988187884417589065162466533279019703073787172439798159584162769794613566466294295348586598751018383869128892469242002299597101203456
> prod(1:100)
[1] 93326215443944102188325606108575267240944254854960571509166910400407995064242937148632694030450512898042989296944474898258737204311236641477561877016501813248
> prod( as.double(1:100) )
[1] 93326215443944150965646704795953882578400970373184098831012889540582227238570431295066113089288327277825849664006524270554535976289719382852181865895959724032
> all.equal( prod(1:100) , factorial(100) , prod( as.double(1:100) ) )
[1] TRUE

If I do some tests against a variable set to the 'known' number of 100! then I see the following:

# This is (as far as I know) the 'true' value of 100!
> n<- as.double(93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000)
> factorial(100) - n
[1] -1902315522848807765998160811905210717565551993186466795054798772271710903343294674760811531554315419925519536152107160826913610179566298858520576
> prod(1:100) - n
[1] -48777321098687378615337456715518223527321845979140174232174327494146433419058837814379782860367062049372295798771978482741374619988879457910784
> prod(as.double(1:100)) - n
[1] 0

The final result evaluates to zero, but the number returned for prod( as.double( 1:100 ) ) does not display as I would expect, even though it correctly evaluates prod( as.double( 1:100 ) ) - n where n is a variable set to the value of 100!.

Can anyone explain this behaviour to me please? It should not be related to overflow etc as far as I am aware, as I am using a x64 system. Version and machine info below:

> .Machine$double.xmax
[1] 1.798e+308
> str( R.Version() )
List of 14
 $ platform      : chr "x86_64-apple-darwin9.8.0"
 $ arch          : chr "x86_64"
 $ os            : chr "darwin9.8.0"
 $ system        : chr "x86_64, darwin9.8.0"
 $ status        : chr ""
 $ major         : chr "2"
 $ minor         : chr "15.2"
 $ year          : chr "2012"
 $ month         : chr "10"
 $ day           : chr "26"
 $ svn rev       : chr "61015"
 $ language      : chr "R"
 $ version.string: chr "R version 2.15.2 (2012-10-26)"
 $ nickname      : chr "Trick or Treat"

Can anyone explain this to me? I don't doubt that R does everything correctly and this is most likely useR related. You might point out that since prod( as.double( 1:100 ) ) - n evaluates correctly what am I bothered about, but I am doing Project Euler Problem 20 so I needed the correct digits displayed.

Thanks

Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • 6
    To calculate the precise value of `100!` using R, do: `library(gmp); factorialZ(100)` – Josh O'Brien Jan 14 '13 at 15:15
  • Thank you to all posters on this question. I think I now have a slightly better understanding of large integers in R. With `library(gmp)` as has been suggested by a couple of people I note that I can do `identical(factorialZ(100) , prod(as.bigz(1:100)))` which will return `[1]TRUE`. – Simon O'Hanlon Jan 14 '13 at 15:43

4 Answers4

15

This has to do not with the maximum value for a double but with its precision.

100! has 158 significant (decimal) digits. IEEE doubles (64 bit) have 52 bits of storage space for the mantissa, so you get rounding errors after about 16 decimal digits of precision have been exceeded.

Incidentally, 100! is in fact, as you suspected,

93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

so all of the values R calculated are incorrect.

Now I don't know R, but it seems that all.equal() converts all three of those values to floats before comparing, and so their differences are lost.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Please carefully check my value of `n` above. It is EXACTLY the same as what you posted. I will wait a day or so to see if there are any other opinions on this before I mark this question as answered. Thanks. – Simon O'Hanlon Jan 14 '13 at 11:08
  • @SimonO101: You're right; I had overlooked that one when doing my comparisons. I have edited my answer accordingly. – Tim Pietzcker Jan 14 '13 at 11:10
  • Thanks Tim. Could you explain why R correctly evaluates `prod( as.double( 1:100 ) - 100!` (i.e. `=0`) if it is not correctly calculating the number in the first place? Also, if a 158 digit integer starts differing after the 14th digit that seems like a massive amount of rounding error? – Simon O'Hanlon Jan 14 '13 at 11:12
  • Also, on the same machine, if I use J, I get the right number straight away! I admit that I don't really know J (yet) so perhaps it stores number differently in memory? But in J `!100x` gives: `93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000` – Simon O'Hanlon Jan 14 '13 at 11:15
  • @SimonO101: It's calculating "correctly" because it's throwing away all the digits after the precision limit has been reached. It's just like with any calculator: 10**100 + 1 == 10**100. J (like Python, which I used for my calculations) appears to be using arbitrary-length integers. As soon as you convert those to a floating point number, though - BAM! precision gone. According to [this question](http://stackoverflow.com/q/2053397/20670), R doesn't seem to have such a "BigInt" datatype. So you either need to implement it yourself or use Python for this Project Euler challenge. – Tim Pietzcker Jan 14 '13 at 11:15
  • 4
    One possible solution would be the gmp package which provides an interface to the GNU Multiple Precision library from R. – Jackson Jan 14 '13 at 11:51
  • 2
    @TimPietzcker, that's correct - R does _not_ have a built-in type for large integers like Python does. It's targeted at statistical data processing and not at solving challenges from the number theory. – Hristo Iliev Jan 14 '13 at 13:32
13

Your test with all.equal does not produce what you expect. all.equal can only compare two values. The third argument is positionally matched to tolerance, which gives the tolerance of the comparison operation. In your invocation to all.equal you give it a tolerance of 100! which definitely leads to the comparison being true for absurdly differing values:

> all.equal( 0, 1000000000, prod(as.double(1:100)) )
[1] TRUE

But even if you give it two arguments only, e.g.

all.equal( prod(1:100), factorial(100) )

it would still produce TRUE because the default tolerance is .Machine$double.eps ^ 0.5, e.g. the two operands have to match to about 8 digits which is definitely the case. On the other hand, if you set the tolerance to 0, then neither three possible combinations emerge equal from the comparison:

> all.equal( prod(1:100), factorial(100), tolerance=0.0 )
[1] "Mean relative difference: 1.986085e-14"
> all.equal( prod(1:100), prod( as.double(1:100) ), tolerance=0.0 )
[1] "Mean relative difference: 5.22654e-16"
> all.equal( prod(as.double(1:100)), factorial(100), tolerance=0.0 )
[1] "Mean relative difference: 2.038351e-14"

Also note that just because you've told R to print 200 significant numbers doesn't mean that they are all correct. Indeed, 1/2^53 has about 53 decimal digits but only the first 16 are considered meaningful.

This also makes your comparison to the "true" value flawed. Observe this. The ending digits in what R gives you for factorial(100) are:

...01203456

You subtract n from it, where n is the "true" value of 100! so it should have 24 zeroes at the end and hence the difference should also end with the same digits that factorial(100) does. But rather it ends with:

...58520576

This only shows that all those digits are non-significant and one should not really look into their value.

It takes 525 bits of binary precision in order to exactly represent 100! - that's 10x the precision of double.

Hristo Iliev
  • 72,659
  • 12
  • 135
  • 186
  • Many thanks for the R-centric explanation. I will mark this one as the correct answer rather than Tim's (sorry!) because I feel it better answers my original question given the [r] tag. Thanks. – Simon O'Hanlon Jan 14 '13 at 14:03
7

I will add a third answer just to graphically describe the behaviour you are encountering. Essentially, the double precision for factorial calculation is sufficient up to 22!, then it starts diverging more and more from the real value.

Around the 50!, there is a further distinction between the two methods factorial(x) and prod(1:x), with the latter yielding, as you indicated, values more similar to the "real" factor.

Factorial calculation precision in R

Code attached:

# Precision of factorial calculation (very important for the Fisher's Exact Test)
library(gmp)
perfectprecision<-list()
singleprecision<-c()
doubleprecision<-c()
for (x in 1:100){
    perfectprecision[x][[1]]<-factorialZ(x)
    singleprecision<-c(singleprecision,factorial(x))
    doubleprecision<-c(doubleprecision,prod(1:x))
}


plot(0,col="white",xlim=c(1,100),ylim=c(0,log10(abs(doubleprecision[100]-singleprecision[100])+1)),
        ,ylab="Log10 Absolute Difference from Big Integer",xlab="x!")
for(x in 1:100) {
    points(x,log10(abs(perfectprecision[x][[1]]-singleprecision[x])+1),pch=16,col="blue")
    points(x,log10(abs(perfectprecision[x][[1]]-doubleprecision[x])+1),pch=20,col="red")
}
legend("topleft",col=c("blue","red"),legend=c("factorial(x)","prod(1:x)"),pch=c(16,20))
Federico Giorgi
  • 10,495
  • 9
  • 42
  • 56
1

Well, you can tell from the body of factorial that it calls gamma, which calls .Primitive("gamma"). What does .Primitive("gamma") look like? Like this.

For large inputs, .Primitive("gamma")'s behaviour is on line 198 of that code. It's calling

exp((y - 0.5) * log(y) - y + M_LN_SQRT_2PI +
            ((2*y == (int)2*y)? stirlerr(y) : lgammacor(y)));

which is just an approximation.


By the way, the article on Rmpfr uses factorial as its example. So if you're trying to solve the problem, "just use the Rmpfr library".

isomorphismes
  • 8,233
  • 9
  • 59
  • 70