3

This is the first time I post something on a forum, so please be gentle. I've been programming in R for over a year now.

I'm trying to do a (mathematically very simple) statistical analysis of large datasets that come directly from a mass spectrometer. As you may know, these instruments are extremely precise and can measure very large, as well as very small voltages precisely: 50V to 0.00000000000000010V. The values are then reported to a tab-delimited file, which I can read into R.

However, at this point, I have the following problem: If I convert the data into doubles, I lose significant information. If I keep them in characters or factors, I cannot "use" them and calculate what I need to get.

Is there a work-around, so I can keep the precision AND use R? Would it be better to use a C++-based language, such as Matlab? Would Matlab be able to do this?

Kim
  • 4,080
  • 2
  • 30
  • 51
maxalmond
  • 375
  • 3
  • 9
  • 2
    Do you have an example to illustrate your problem ? –  Jun 13 '14 at 07:43
  • How precise is the instrument when measuring a large voltage? – Patricia Shanahan Jun 13 '14 at 07:52
  • 2
    Be carefull with the precision of floating points numbers. See [R inferno page 10](http://www.burns-stat.com/pages/Tutor/R_inferno.pdf), chapter "Falling into the Floating point Trap". (e.g.: `> 0.1 != (0.2/2)` is `[1] TRUE` and `> print(0.01,digits=22)` is `[1] 0.01000000000000000020817`). I believe it's problem with the C compiler then you may have the same problem in other programming language – jomuller Jun 13 '14 at 08:05
  • 2
    @jomuller There's no problem with the C compiler. You cannot represent all numbers in a finite sized data type. – David Heffernan Jun 13 '14 at 08:15
  • @DavidHeffernan Thank you for this clarification! I have only a little experience in compiled languages. – jomuller Jun 13 '14 at 08:27
  • @user2441481 If you want help here, you really need to address the first comment in this thread. Until you do so, you are not likely to get real help because the actual problem is not clearly specified. – David Heffernan Jun 13 '14 at 10:09

1 Answers1

4

You can use Library gmp

http://cran.r-project.org/web/packages/gmp/

Example (Large Numbers)

install.packages("gmp")
library(gmp)
largevalue <- as.bigz(2305843009213694080000000)
largevalue 

Example (Small Numbers)

smallvalues <- asNumeric(cbind(0.0000000000000000000001,0.0000000000000000000003))
smallvalues
Pork Chop
  • 28,528
  • 5
  • 63
  • 77
  • I don't really see how this helps. It would be better for us diagnose the problem before throwing libraries at it. – David Heffernan Jun 13 '14 at 08:29
  • 1
    @ David Heffernan The example has been edited for 'Small Numbers' arithmetic problem as highlighted in the question. You can do the arithmetic by coercing to a numeric as shown. – Pork Chop Jun 13 '14 at 08:45
  • 1
    Double precision floating point values can work perfectly well with small numbers. What's the big deal with storing `0.00000000000000010` in a double? The resolution of the data type will far exceed the resolution of the raw data. Again, I feel that it would be best to diagnose the true nature of the problem before attacking it with a library. – David Heffernan Jun 13 '14 at 08:48
  • 2
    There's nice thread already there which goes in quite a detail about floating point arithmetic in R so I simply have found a solution with the 'gmp' package before so I thought of it quickly. For more details refer here: http://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal http://stackoverflow.com/questions/6874867/floating-point-issue-in-r – Pork Chop Jun 13 '14 at 08:57
  • Those topics cover the issue of exact representability. It's not at all clear that is the issue here. – David Heffernan Jun 13 '14 at 09:02
  • 1
    My understanding of the issue here was 'How can I keep the precision to do my calculations using R down to the last point' which I think I answered with my example :). – Pork Chop Jun 13 '14 at 09:07
  • Do you understand the *floating* part of floating point? Broadly, `0.0000000000000001` can be represented as well as `0.1`. – David Heffernan Jun 13 '14 at 09:09