1

I stumbled into a strange behavior in R, where double values are not saved to a CSV as the original value.

Reproducible example:

set.seed(1)
df <- data.frame(ID = 1:10, X = rnorm(10))
write.csv(df, "test.csv", row.names = F)
read.csv("test.csv") == df
        ID     X
 [1,] TRUE FALSE
 [2,] TRUE FALSE
 [3,] TRUE FALSE
 [4,] TRUE FALSE
 [5,] TRUE FALSE
 [6,] TRUE FALSE
 [7,] TRUE FALSE
 [8,] TRUE FALSE
 [9,] TRUE FALSE
[10,] TRUE  TRUE

Granted, the difference appears to occur only at or after the 15th decimal digit, but it makes me uneasy to trust the CSV file.

options(digits = 20)

df[1,]
  ID                    X
1  1 -0.62645381074233242

read.csv("test.csv")[1,]
  ID                    X
1  1 -0.62645381074233197

Is there any way to circumvent this issue? Is it a known bug?

philsf
  • 217
  • 1
  • 14
  • 1
    https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal – alistaire Feb 24 '18 at 20:10
  • Thanks @alistaire. I forgot about `all.equal`, silly me. – philsf Feb 24 '18 at 20:13
  • 1
    Check `?write.table` -> details : by default it writes 15 digits in the csv, for a finer control you should convert the numeric columns to character using format setting the desired number of digits. – digEmAll Feb 24 '18 at 20:14

2 Answers2

3

If you want to increase the precision in your write.csv function, you could achieve that with sprintf. "%.20f" will make sure that the first 20 digits are the same, which is enough for R to conclude that the numbers are equal.

set.seed(1)
df <- data.frame(ID = 1:10, X = rnorm(10))

write.csv(data.frame(df$ID, newX =sprintf("%.20f",df$X)), "test.csv", 
       row.names = F)

x <- read.csv("test.csv")
x == df

 #df.ID newX
 [1,]  TRUE TRUE
 [2,]  TRUE TRUE
 [3,]  TRUE TRUE
 [4,]  TRUE TRUE
 [5,]  TRUE TRUE
 [6,]  TRUE TRUE
 [7,]  TRUE TRUE
 [8,]  TRUE TRUE
 [9,]  TRUE TRUE
 [10,] TRUE TRUE
Daniel
  • 2,207
  • 1
  • 11
  • 15
0

As pointed out in the comments, the R FAQ explains that the test in the question is not the proper way to take into account machine precision.

For archival purposes, the proper test is all.equal:

all.equal(read.csv("test.csv"), df)
[1] TRUE
philsf
  • 217
  • 1
  • 14