4

I have imported a dataset which contains large numbers which were automatically converted to exponential notation. Because I had to see the full number, I used options(scipen = 999). I discovered that the imported number did not equal the original number from the dataset. For example: 5765949338897345178 was changed to 5765949338897345536.

How can it be that these numbers are not the same? The weird thing is that when I use: which(dim_alias1$id == 5765949338897345536) and which(dim_alias1$id == 5765949338897345178), it returns the same rownumber. How is this possible?

slamballais
  • 3,161
  • 3
  • 18
  • 29
M.D.
  • 91
  • 1
  • 1
  • 9
  • 2
    There is not enough precision in a double to exactly hold these values. – Matthew Lundberg Feb 19 '17 at 16:59
  • 1
    http://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal It might seem counter-intuitive but the answer to question of the variety "why are these seemingly equal numbers not equal" and "why are these seemingly unequal numbers being treated as equal" have the same answer. – Dason Feb 19 '17 at 17:06
  • These numbers are IDnumbers and should be exact for linking with other datasets. How can this be solved? – M.D. Feb 19 '17 at 17:12
  • 1
    That worked for me. Thank you very much for the fast and easy solution! – M.D. Feb 19 '17 at 17:22

1 Answers1

3

As you are using the variable as an id number, it doesn't need to be numeric. So set the column class to character when reading in.

Example:

dat <- data.frame(id=12345, x=1)
write.table(dat, tmp <- tempfile())
dat2 <- read.table(tmp, colClasses = c(id="character"))
str(dat2)

#'data.frame':  1 obs. of  2 variables:
# $ id: chr "12345"
# $ x : int 1
Community
  • 1
  • 1
user20650
  • 24,654
  • 5
  • 56
  • 91