5

When I create a dataframe from numeric vectors, R seems to truncate the value below the precision that I require in my analysis:

data.frame(x=0.99999996)

returns 1 (*but see update 1)

I am stuck when fitting spline(x,y) and two of the x values are set to 1 due to rounding while y changes. I could hack around this but I would prefer to use a standard solution if available.

example

Here is an example data set

d <- data.frame(x = c(0.668732936336141, 0.95351462456867,
0.994620622127435, 0.999602102672081, 0.999987126195509, 0.999999955814133,
0.999999999999966), y = c(38.3026509783688, 11.5895099585560,
10.0443344234229, 9.86152339768516, 9.84461434575695, 9.81648333804257,
9.83306725758297))

The following solution works, but I would prefer something that is less subjective:

plot(d$x, d$y, ylim=c(0,50))
lines(spline(d$x, d$y),col='grey') #bad fit
lines(spline(d[-c(4:6),]$x, d[-c(4:6),]$y),col='red') #reasonable fit

Update 1

*Since posting this question, I realize that this will return 1 even though the data frame still contains the original value, e.g.

> dput(data.frame(x=0.99999999996))

returns

structure(list(x = 0.99999999996), .Names = "x", row.names = c(NA, 
-1L), class = "data.frame")

Update 2

After using dput to post this example data set, and some pointers from Dirk, I can see that the problem is not in the truncation of the x values but the limits of the numerical errors in the model that I have used to calculate y. This justifies dropping a few of the equivalent data points (as in the example red line).

pnuts
  • 58,317
  • 11
  • 87
  • 139
David LeBauer
  • 31,011
  • 31
  • 115
  • 189

2 Answers2

7

If you really want set up R to print its results with utterly unreasonable precision, then use: options(digits=16).

Note that this does nothing for that accuracy of functions using htese results. It merely changes how values appear when they are printed to the console. There is no rounding of the values as they are being stored or accessed unless you put in more significant digits than the abscissa can handle. The 'digits' option has no effect on the maximal precision of floating point numbers.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks for the answer; although the precision may be unreasonable, it keeps the results, such as spline interpolation, reasonable. – David LeBauer Dec 27 '10 at 18:28
  • I fear you still focus on the wrong issue: `spline(x, y)` will never use the printed values. – Dirk Eddelbuettel Dec 27 '10 at 18:33
  • @Dirk I think now I understand that this is not a problem with R rounding off my x values, but a problem with the error in the model calculating my Y values. – David LeBauer Dec 27 '10 at 18:40
5

Please re-read R FAQ 7.31 and the reference cited therein -- a really famous paper on what everbody should know about floating-point representation on computers.

The closing quote from Kerngighan and Plauger is also wonderful:

10.0 times 0.1 is hardly ever 1.0.

And besides the numerical precision issue, there is of course also how R prints with fewer decimals than it uses internally:

> for (d in 4:8) print(0.99999996, digits=d)
[1] 1
[1] 1
[1] 1
[1] 1
[1] 0.99999996
> 
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • I didn't get much from the FAQ; I'll read Goldberg's paper. My problem comes when fitting `spline(x,y)` and two of the x values == 1 due to rounding while y continues to increase. – David LeBauer Dec 27 '10 at 18:04
  • on re-reading the FAQ, I see the relevant point is "If you want much greater accuracy than this you will need to consider error propagation carefully." Thanks for helping me work through this – David LeBauer Dec 27 '10 at 18:50